[R] Trouble building R2.1.0 from source on Linux: package VR

2005-06-19 Thread Frank Gibbons
Hi,

Following on from suggestions made last week, I decided to install R 2.1.0 
on my Linux machine. I'm running into a problem there however, as shown:

make[1]: Entering directory 
`/d0/home/fgibbons/tmp/R2.1.0/R-2.1.0/src/library/Recommended'
make[2]: Entering directory 
`/d0/home/fgibbons/tmp/R2.1.0/R-2.1.0/src/library/Recommended'
begin installing recommended package VR
WARNING: ignoring environment value of R_HOME
tar: Skipping to next header
tar: Archive contains obsolescent base-64 headers
  incomplete literal tree

gzip: VR.tgz: invalid compressed data--format violated
tar: Error exit delayed from previous errors
ERROR: cannot extract package from 'VR.tgz'
make[2]: *** [VR.ts] Error 1
make[2]: Leaving directory 
`/d0/home/fgibbons/tmp/R2.1.0/R-2.1.0/src/library/Recommended'
make[1]: *** [recommended-packages] Error 2
make[1]: Leaving directory 
`/d0/home/fgibbons/tmp/R2.1.0/R-2.1.0/src/library/Recommended'
make: *** [stamp-recommended] Error 2

This is most unusual - I must have built R from source five or six times 
going back to 1.5.0 and don't recall any problems like this. Does anyone 
have any suggestions about where I might look for the source of this problem.

In particular, I'm interested in using package 'vsn', whose installation is 
(I believe) blocked by this problem. I downloaded the source from CRAN this 
afternoon, choosing the version 2.1.0 'stable' code. I don't have sysadmin 
privileges on the build machine, but this has never been a problem before - 
I just pass --prefix=$HOME to the configure script. FWIW, I believe the 
sysadmins use Debian.

This is a pretty time-critical matter for me (I wouldn't have chosen to 
upgrade now, were it not for my earlier problem with merge when there are 
empty labels), any assistance greatly appreciated. Thanks in advance.

-Frank

PhD, Computational Biologist,
Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA.
Tel: 617-432-3555   Fax: 
617-432-3557   http://llama.med.harvard.edu/~fgibbons

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] possible bug in merge with duplicate blank names in 'by' field.

2005-06-17 Thread Frank Gibbons
Thanks for your quick responses, Gabor and Brian.

I'm currently running R version 1.9.1 on Linux. Actually, I have just 
tested this on R v.2.1.0 running under Windows XP, and indeed, as you both 
indicate, the problem does not exist on that version for that OS. So, at an 
appropriate time I'll upgrade my Linux installation to the most recent 
version (1.9.1 is a year old, I guess).

-Frank

At 03:26 AM 6/17/2005, Prof Brian Ripley wrote:
What version of R is this (please do see the posting guide)?

In both 2.1.0 and 2.1.1 beta I get

all
   Promoter ip.x ip.y ip
130   40 40
240   40 40
3a   10   NA NA
4c   20   20 20
5b   NA   15 15
6d   NA   30 30

so cannot reproduce your result. Are you sure that the `blanks' really are 
empty and not some character that is printing as empty on your unstated OS?

BTW ' ' is what is normally called `blank'.

BTW, these are not `names' but character strings: `names' has other 
meanings in R.

PhD, Computational Biologist,
Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA.
Tel: 617-432-3555   Fax: 
617-432-3557   http://llama.med.harvard.edu/~fgibbons

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] possible bug in merge with duplicate blank names in 'by' field.

2005-06-16 Thread Frank Gibbons
Run this:

p - c('a', 'c', '', ''); a - c(10, 20, 30, 40); d1 - 
data.frame(Promoter=p, ip=a) # Note duplicate empty names in p.
p - c('b', 'c', 'd', ''); a - c(15, 20, 30, 40); d2 - 
data.frame(Promoter=p, ip=a)
all - merge(x=d1, y=d2, by=Promoter, all=T)
all - merge(x=all, y=d2, by=Promoter, all=T)
all

Data is this:

d1
   Promoter ip
1a 10
2c 20
3  30
4  40

d2
   Promoter ip
1b 15
2c 20
3d 30
4  40

Output looks like this:

   Promoter ip.x ip.y ip
140   30 30
240   40 30
340   30 40
440   40 40
5b   15   NA NA
6c   20   20 20
7d   30   NA NA
8a   NA   10 10

The weird thing about this is (in my view) that each instance of '' is 
considered unique, so with each successive merge, all combinatorial 
possibilities are explored, like a SQL outer join (Cartesian product). For 
non-empty names, an inner join is performed.

Dealing with genomic data (10^4 datapoints), it's easy to have a couple of 
blanks buried in the middle of things, and to combine several replicates 
with successive merges. I couldn't understand how my three replicates of 
6000 points, in which I expected  substantial overlap in the labels, were 
taking so long to merge and ultimately generating 57000 labels. The culprit 
turned out to be a few hundred blanks buried in the middle.

Why does the empty (null) name merit special treatment? Perhaps I'm 
missing something. I hesitate to submit this as a bug, since technically I 
guess you could say that blank names, especially duplicates, are not 
kosher. But on the other hand, this combinatorial behaviour seems to occur 
only for blanks.

-Frank

PhD, Computational Biologist,
Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA.
Tel: 617-432-3555   Fax: 
617-432-3557   http://llama.med.harvard.edu/~fgibbons

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] lda source code

2003-10-01 Thread Frank Gibbons
Wei Geng,

I asked the same question about six weeks ago, so let me try to answer it. 
The source for the entire package 'MASS' is in a single file, I believe (at 
least this is true on my Linux setup). The exact location of that file 
you'll have to determine by searching the directory/folder where you 
installed it. The function 'lda' is implemented entirely in R itself, like 
much of its functionality. Look at the functions 'lda' and 'predict.lda' in 
this file for details.

Comments are sparse in most R code, from what I gather, but if you look in 
Pattern Recognition and Neural Networks by Brian Ripley (one of the 
authors of this package), you'll find a discussion in section 2.4 
'Predictive classification' that covers much of what's going on, from what 
I've been able to glean.

I hope that helps. There are certainly others out there who are more au 
fait with this than me.

-Frank Gibbons

At 05:47 PM 10/1/2003, you wrote:
Wei Geng wrote:

I am new to R. Trying to find out how lda() {in MASS R1.8.0 Windows} was
implemented in R. Does anyone know where to find out lda source code ?
Thanks.
Here:

http://cran.r-project.org

Hint: MASS is a *package*.  You want to view its *source*.

Same with most other R packages.  Or just about anything else you want to 
know about R.

Cheers

Jason
--
Indigo Industrial Controls Ltd.
http://www.indigoindustrial.co.nz
64-21-343-545
[EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PhD, Computational Biologist,
Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA.
Tel: 617-432-3555   Fax: 
617-432-3557   http://llama.med.harvard.edu/~fgibbons

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Cook-distance-type plot (vertical bars)

2003-08-28 Thread Frank Gibbons
Hi,

Figure 13 of Emmanuel Paradis's R for Beginners was produced by termplot 
working on an aov object. The lower right-hand plot is labelled Cook's 
distance plot, and I'd really like to produce a similar type of figure, 
but in a totally different context. (I'm not even sure what this kind of 
figure is called, perhaps an impulse plot, where instead of a point at 
(x,y), there's a vertical bar running from the x-axis up to where the point 
would be).

Can anyone give me pointers on where to look for more info. I've had a look 
in the places I could think of (plot.lm.R, termplot.R, plot.R, aov.R), and 
couldn't find anything. Maybe I overlooked it?

Thanks

-Frank Gibbons

PhD, Computational Biologist,
Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA.
Tel: 617-432-3555   Fax: 
617-432-3557   http://llama.med.harvard.edu/~fgibbons

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Cook-distance-type plot (vertical bars)

2003-08-28 Thread Frank Gibbons
Thanks to all who responded, and so promptly too: it works exactly as you 
describe.


 Figure 13 of Emmanuel Paradis's R for Beginners was produced by termplot
 working on an aov object.
No, it was produced by plot() working on a aov object, as its caption
indicates.  The termplot() is Figure 14.
Thomas Lumley is quite right, it's produced by plot() (not termplot()), and 
it is mentioned on p31 of R for beginners. My mistake.

In the interest of self education, is there a more comprehensive source for 
plot-types that I should read? Ideally, this would be something with lots 
of figures, so that I could browse the figures to find what I want to do, 
and then look up how to do it. R for Beginners goes some way along this 
path, but perhaps there's something more comprehensive?

Thanks again,

-Frank

PhD, Computational Biologist,
Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA.
Tel: 617-432-3555   Fax: 
617-432-3557   http://llama.med.harvard.edu/~fgibbons

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] discriminant function

2003-08-26 Thread Frank Gibbons
Stefan,

I asked the same question last week. As Brian Ripley, its author, said then 
(and others), the only way to see what's going on is to read the code. It's 
pretty complicated statistically (that's why the performance is so good!), 
many of the details are in chapter 2 of Pattern Recognition  Neural 
Networks. The upshot is that it's really not that easy to come up with a 
simple equation to encompass all of what lda.R does, unfortunately.

You can get the scaling factors programmatically for the centered variables 
by running coef() on the LDA object, or just using the 
'lda_object$scaling' That's a start, but there's a long way to go from 
there

-Frank

At 09:24 AM 8/26/2003, you wrote:
How can I extract the linear discriminant functions resulting from a LDA
analysis?
The coefficients are listed as a result from the analysis but I have not
found a way to extract these programmatically. No refrences in the
archives were found.
Thank you very much,

Stefan

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PhD, Computational Biologist,
Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA.
Tel: 617-432-3555   Fax: 
617-432-3557   http://llama.med.harvard.edu/~fgibbons

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] LDA in R: how to extract full equation, especially constant term

2003-08-21 Thread Frank Gibbons
Hi,

Having dipped my toe into R a few times over the last year or two, in the 
last few weeks I've been using it more and more; I'm now a thorough 
convert. I've just joined the list, because although it's great, I do have 
this problem...

I'm using linear discriminant analysis for binary classification, and am 
happy with the classification performance using predict(). What I'd like to 
do now is extract the equation for this classifier, for use elsewhere (in 
Perl/Python code).

I know that I can get the means and scaling factors from the predict() 
object, but I'm having trouble computing the constant term. From reading 
Venables  Ripley and Hastie/Tibshirani/Friedman, I know the priors play 
a  role in adjusting the cut-point from zero (for equally sized classes), 
based on the relative sizes of the two classes. But when I try to do the 
computation, I don't get a value that agrees with that returned by predict().

I've seen a post about this problem in the past, but it was never really 
answered by anyone who was familiar with R/S-PLUS. Can anyone help me with 
this? I guess I'm really wondering how R is computing the constant term in 
its discriminant function.

Thanks,

-Frank Gibbons

PhD, Computational Biologist,
Harvard Medical School BCMP/SGM-322, 250 Longwood Ave, Boston MA 02115, USA.
Tel: 617-432-3555   Fax: 
617-432-3557   http://llama.med.harvard.edu/~fgibbons

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help