[R] discriminant function analysis in R

2005-03-30 Thread Graham Jones
In message [EMAIL PROTECTED], r-help-
[EMAIL PROTECTED] writes
Dear R Users,

I'm very very interested in learning how to use R to carry out a 
classification of data using discriminant function analysis.  I've 
found the MASS package and the lda function, but the examples in the 
help system are a bit over my head.  I'm not exactly sure how to 
interpret the output, for example, of if the inputs I've chosen are 
best suited to my needs.
[...]

I would recommend writing your own simple version of lda in R. For
example, stick to two class problems, and don't worry too much about
efficiency or dealing with bad input. Then think about how you might
make your routines of more general use (but don't bother to implement
this). This is a good way of learning R, and having got this far on your
own, you will find the documentation and examples for lda make sense.
Well, it worked for me.

Here's some useful functions:
?%*%
?t
?determinant
?solve
?mean
?cov
?cat
?scan

-- 
Graham Jones, author of SharpEye Music Reader
http://www.visiv.co.uk
21e Balnakeil, Durness, Lairg, Sutherland, IV27 4PT, Scotland, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


FYI from apple employee RE: [R] Memory error in Mac OS X Aqua GUI v1.01 with cluster

2005-02-25 Thread Graham Jones
Betty Gilbert wrote:

[...]
I'm trying to cluster a matrix 
I created with a simulation with dimensions
dim(nca35)
[1] 1048112
[...]
But I'm still getting errors like the following with funtions in the 
cluster package
[...]

I think that the xcluster function in the ctc package from Bioconductor
is specifically designed to deal with data sets of this sort of size
without using much memory. 

-- 
Graham Jones, author of SharpEye Music Reader
http://www.visiv.co.uk
21e Balnakeil, Durness, Lairg, Sutherland, IV27 4PT, Scotland, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Graphics (crashes under Windows)

2005-02-23 Thread Graham Jones
In message [EMAIL PROTECTED], r-help-
[EMAIL PROTECTED] writes

The R platform that I installed on my Windows XP crashes everytime that
I try to run some sophisticated graphics (e.g. Demo Graphics). Is that
to do with the configuration? Shall I reinstall it? 

You may have a buggy video driver. If you go to Control Panel, Display,
Settings, Advanced, Troubleshoot, and reduce the hardware acceleration,
it may fix the problem. (Maybe it is worth adding this trick to the R
for Windows FAQ?)

-- 
Graham Jones, author of SharpEye Music Reader
http://www.visiv.co.uk
21e Balnakeil, Durness, Lairg, Sutherland, IV27 4PT, Scotland, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: Off topic -- large data sets. Was RE: [R] 64 Bit R Background Question

2005-02-16 Thread Graham Jones
In message [EMAIL PROTECTED], Prof Brian
Ripley [EMAIL PROTECTED] writes

But Bert's caveats apply: you have 200 problems of size 20,000 since in 
QDA each class's distribution is estimated separately, and a single pass 
will give you the sufficient statistics however large the dataset is.


I think we've interpreted Bert's question differently. I am not saying I
need to have vast amounts of data in RAM, or in a single data structure,
or anything like that, and I am not saying I need a 64-bit version of R.
What I am saying is that if I had 40 million cases for a problem like
the one I described, I'd want to use all of them when designing a
classifier.

Patrick Burns, if you're reading: OCR = optical character recognition.

-- 
Graham Jones, author of SharpEye Music Reader
http://www.visiv.co.uk
21e Balnakeil, Durness, Lairg, Sutherland, IV27 4PT, Scotland, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Off topic -- large data sets. Was RE: [R] 64 Bit R Background Question

2005-02-15 Thread Graham Jones
In message [EMAIL PROTECTED], r-help-
[EMAIL PROTECTED] writes

Can comeone give me an example (perhaps in a private response, since I'm off
topic here) where one actually needs all cases in a large data set (large
being  1e6, say) to do a STATISTICAL analysis? By statistical I exclude,
say searching for some particular characteristic like an adverse event in a
medical or customer repair database, etc. Maybe a definition of
statistical is: anything that cannot be routinely done in a single pass
database query.

If the dimensionality of the data is large, you may need a large number
of cases too. An example from my own experience would be using quadratic
discriminant analysis (with regularization) for classifying symbols for
an OCR program. With 200 classes and 100 features, I'd really like many
millions of cases. I've been using about 20,000 per class or 4 million
in total, but if I had 40 million it would probably work better.
Compared to many applications in pattern recognition and data mining, I
think this is a fairly small example. 

-- 
Graham Jones, author of SharpEye Music Reader
http://www.visiv.co.uk
21e Balnakeil, Durness, Lairg, Sutherland, IV27 4PT, Scotland, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Name conflicts when passing arguments for one function to another

2005-01-29 Thread Graham Jones
I am fairly new to R. I find it surprising that

f - function(x,a) {x-a}
uniroot(f, c(0,1), a=.5)

works, but

integrate(f, 0, 1, a=.5)

gives an error: Error in integrate(f, 0, 1, a = 0.5) : argument 4
matches multiple formal arguments

What is the best way of avoiding such surprises? Is there a way of
telling integrate() that the 'a' argument is for f()?

If I wrote my own function along the lines of uniroot() or integrate()
is there a better way of passing on arguments?

-- 
Graham Jones, author of SharpEye Music Reader
http://www.visiv.co.uk
21e Balnakeil, Durness, Lairg, Sutherland, IV27 4PT, Scotland, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html