Re: [R] Quick partitioning

2005-08-26 Thread Prof Brian Ripley
On Thu, 25 Aug 2005, Anna Oganyan wrote:

 Hello,
 I am quite new in R, and  I have one problem:
 I have large d-dimensional data sets (d=2, 3, 6, 10). I would like to
 divide the d-dim space into n (n may be 10, but better some larger
 number, for example 20) equally sized d-dim hypercubes  and count  how
 many data points are in each cube. Is there any way to do  it quickly, I
 mean - in a reasonable time? Actually, I  want  to get some rough idea
 of underlying densities of these data and compare them.
 Thanks a lot!
 Anna

How do you divide a 10D space into 10 hypercubes?  You need at least 
some of dimensions to be undivided.

The general idea is easy: apply cut() to each dimension, so your 
dimensions become factors, then table() will produce the counts.  That 
will be quick enough for millions of points.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Fitting data to gaussian distributions

2005-08-26 Thread Prof Brian Ripley
See also the book MASS, the one fitdistr supports.

On Thu, 25 Aug 2005, Luis Gracia wrote:

 Hi again,

 self-answered. I took a breath and started another google search, this
 time more successful. I found the following packages in case somebody
 has the same question:
 nor1mix
 wle
 mixdist(I found this one to be the most useful in my case)

 Best,

 Luis

 Luis Gracia said the following on 08/25/05 20:31:
 Hi!

 I need to fit a data that shows up as two gaussians partially
 superimposed to the corresponding gaussian distributions, i.e.

 data=c(rnorm(100,5,2),rnorm(100,-6,1))

 I figured it out how to do it with mle or fitdistr when only one
 gaussian is necessary, but not with two or more. Is there a function in
 R to do this?

 Thank you very much in advance,

 Luis

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] question about coxph (was covariance matrix under null)

2005-08-26 Thread Prof Brian Ripley
You will need to tell us of _what_ you want the covariance matrix and what 
you mean by the `null hypothesis':  coxph does estimation, not testing.

If you want the covariance matrix of the parameter estimates, see vcov(), 
which has a coxph method.

On Thu, 25 Aug 2005, Devarajan, Karthik wrote:


 Hello

 I am fitting a Cox PH model using the function coxph(). Does anyone know how
 to obtain the estimate of the covariance matrix under the null hypothesis.
 The function coxph.detail() does not seem to be useful for this purpose.

 Thanks,
 KD.

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

PLEASE do (no HTML mail, useful subject, please)

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help about R

2005-08-26 Thread Martin Maechler
 Don == Don MacQueen [EMAIL PROTECTED]
 on Thu, 25 Aug 2005 08:11:42 -0700 writes:

Don Also, for the three dimensional graphic,
Don help.search(3d)
Don will lead to a reference to the cloud() function in the lattice 
package.

Don I don't remember if the lattice package is installed by default. 

it is, since it's recommended.
In such a situation (as in quite a few others), please consider
using

   packageDescription(lattice)

  Package: lattice
  Version: 0.12-5
  Date: 2005/08/16
  Priority: recommended 
  ^^
  Title: Lattice Graphics
  Author: Deepayan Sarkar [EMAIL PROTECTED]
  
  

where I've added the  and  markup.

Don If not, you will have to install it.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] A. Mani : Tapply

2005-08-26 Thread Petr Pikal

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] learning decision trees with one's own scoring functins

2005-08-26 Thread zhihua li

Hi netters,

I want to learn a decision tree from a series of instances (learning data). 
The packages
tree or rpart can do this quite well, but the scoring functions (splitting 
criteria) are
fixed in these packages, like gini or something. However, I'm going to use 
another scoring
function. 

At first I wanna modify the R code of tree or rpart and put my own scoring 
function in. But it seems that tree and rpart perform the splitting 
procedure by calling external C functions, which I have no access to. So do 
I have to write R code from scratch to build the tree with my own scoring 
functions? It's a really tough task. Or r there other R packages that can 
do similar things with more flexible and extensible code?


Thanks a lot!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] learning decision trees with one's own scoring functins

2005-08-26 Thread Philippe Grosjean
Hello,

You have access to the C code of the function in the *source* of the
package. You can modify it and recompile the package and function (its
better then to give a different name!).
Best,

Philippe Grosjean

..°}))
 ) ) ) ) )
( ( ( ( (Prof. Philippe Grosjean
 ) ) ) ) )
( ( ( ( (Numerical Ecology of Aquatic Systems
 ) ) ) ) )   Mons-Hainaut University, Pentagone (3D08)
( ( ( ( (Academie Universitaire Wallonie-Bruxelles
 ) ) ) ) )   8, av du Champ de Mars, 7000 Mons, Belgium
( ( ( ( (
 ) ) ) ) )   phone: + 32.65.37.34.97, fax: + 32.65.37.30.54
( ( ( ( (email: [EMAIL PROTECTED]
 ) ) ) ) )
( ( ( ( (web:   http://www.umh.ac.be/~econum
 ) ) ) ) )  http://www.sciviews.org
( ( ( ( (
..

zhihua li wrote:
 Hi netters,
 
 I want to learn a decision tree from a series of instances (learning 
 data). The packages
 tree or rpart can do this quite well, but the scoring functions 
 (splitting criteria) are
 fixed in these packages, like gini or something. However, I'm going to 
 use another scoring
 function.
 At first I wanna modify the R code of tree or rpart and put my own 
 scoring function in. But it seems that tree and rpart perform the 
 splitting procedure by calling external C functions, which I have no 
 access to. So do I have to write R code from scratch to build the tree 
 with my own scoring functions? It's a really tough task. Or r there 
 other R packages that can do similar things with more flexible and 
 extensible code?
 
 Thanks a lot!
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Truncate levels to use randomForest

2005-08-26 Thread Martin Lam
Hi,

I will explain my problem with this example:

library(randomForest)

# load the iris plant data set
dataset - iris

numberarray - array(1:nrow(dataset), nrow(dataset),
1)

# include only instances with Species = setosa or
virginica
indices - t(numberarray[(dataset$Species == setosa
| 
dataset$Species == virginica) == TRUE])

finaldataset - dataset[indices,]

# just to let you see the 3 classes
levels(finaldataset$Species)

# create the random forest
randomForest(formula = Species ~ ., data =
finaldataset, ntree = 5)

# The error message I get
Error in randomForest.default(m, y, ...) : 
Can't have empty classes in y.

#The problem is that the finaldataset doesn't contain
#any instances of versicolor, so I think the only
way #to solve this problem is by changing the levels
the #Species have to only setosa and virginica,
# correct me if I'm wrong.

# So I tried to change the levels but I got stuck:

# get the possible unique classes
uniqueItems - unique(levels(finaldataset$Species))

# the problem!
newlevels - list(uniqueItems[1] = c(uniqueItems[1],
uniqueItems[2]), uniqueItems[3] = uniqueItems[3])

# Error message
Error: syntax error

# In the help they use constant names to rename the
#levels, so this works (but that's not what I want
#because I don't want to change the code every time I
#use another data set):
newlevels - list(setosa = c(uniqueItems[1],
uniqueItems[2]), virginica = uniqueItems[3])

levels(finaldataset$Species) - newlevels

levels(finaldataset$Species)

finaldataset$Species

---

Thanks in advance,

Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] A. Mani : Tapply

2005-08-26 Thread Petr Pikal
Hi 


 PLEASE do read the posting guide! 
** 

On 25 Aug 2005 at 20:28, A Mani wrote: 

 Hello, 
 Is it safe to use tapply when the result will be of dim 2 
 x 1 or more ? In my PC R crashes.  

or gave you an error message? 

I tried this 

 df-data.frame(A=as.factor(sample(1:1,10,  
replace=T)),B=as.factor(sample(10:11,10,  
replace=T)), num=rnorm(10)) 
 ttt-tapply(df$num, list(df$A,df$B), diff) 

and got 

Error: cannot allocate vector of size 390664 Kb 
In addition: Warning messages: 
1: Reached total allocation of 1000Mb: see help(memory.size)  
2: Reached total allocation of 1000Mb: see help(memory.size)  
 

and the result with smaller data sets are 

df1-data.frame(A=as.factor(sample(1:2,10, 
replace=T)),B=as.factor(sample(10:11,10, replace=T)), 
num=rnorm(10)) 
ttt1-tapply(df1$num, list(df1$A,df1$B), diff) 

 ttt1 
  1011
1 Numeric,24933 Numeric,25141 
2 Numeric,24992 Numeric,24930 


df-data.frame(A=as.factor(sample(1:1000,10, 
replace=T)),B=as.factor(sample(1:11000,10, 
replace=T)), num=rnorm(10)) 
 ttt-tapply(df$num, list(df$A,df$B), diff) 


 
 ttt[1:2,1:5] 
  1 10001 10002 10003 10004 
1 NULL  NULL  NULL  Numeric,0 NULL  
2 NULL  NULL  NULL  Numeric,0 NULL  
 

so you are probably receiving humonguous table of NULLs, zeros  
and few nonzero entries. 

You probably need to use different approach 

Cheers 
Petr 


The code used was on a 3-col 
 data frame with two factor cols and a numeric column. The fn 
was diff 
 . data form being A, B, Num tapply(data$A, list(data$A, 
data$B), 
 diff) 
  
 --  
 A. Mani 
 Member, Cal. Math. Soc 
  
 __ 
 R-help@stat.math.ethz.ch mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html 

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Code in lattice::dotplot function.

2005-08-26 Thread Ernesto Jardim
Deepayan Sarkar wrote:

On 8/25/05, ernesto [EMAIL PROTECTED] wrote:
  

Hi,

I'm trying to understand the code of lattice functions so that I can
write some S4 methods using lattice. The following code is a snipet of
dotplot that is reused in several other functions. I don't understand
why this is needed can someone help ?



It was a hack to enable usage of the form dotplot(x). The latest
version of lattice does not use this sort of construct any more
(replacing it by generics and methods).

Deepayan
  

OK, thanks.

EJ

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] chisq.,test`

2005-08-26 Thread Gavin Simpson
On Fri, 2005-08-26 at 18:15 +1000, Stephen Choularton wrote:
 Hi
  
  I am trying to do this:

I get no syntax error on R 2.1.1-patched. I do get an error though:

Error in chisq.test(c(11, 13, 12, 18, 21, 43, 15, 12, 9, 10, 5, 28,
22,  :
probabilities must sum to 1.

Which leads us to question the values you provided as argument p (which
I assigned to a vector P first)

sum(P)
[1] 0.999783

So you need to make sure your probabilities add sum to one.

HTH

Ps. It would have been helpful if you'd posted the syntax error you
received and your version of R - type version at the R prompt.

Gav

snip 
 but I keep on getting syntax error.  Is this to long, or somehow badly
 formed?
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [T] +44 (0)20 7679 5522
ENSIS Research Fellow [F] +44 (0)20 7679 7565
ENSIS Ltd.  ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk
UCL Department of Geography   [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] chisq.,test`

2005-08-26 Thread Stephen Choularton
Hi
 
 I am trying to do this:
 
chisq.test(c(11, 13, 12, 18, 21, 43, 15, 12, 9, 10, 5, 28, 22, 11, 15,
11, 18, 28, 16, 8, 15, 19, 44, 18, 11, 23, 15, 23, 2, 5, 4, 14, 3, 22,
9, 0, 6, 19, 15, 32, 3, 16, 14, 10, 24, 16, 24, 31, 29, 28, 16, 26, 11,
11, 4, 17, 16, 13, 20, 26, 16, 19, 34, 19, 17, 14, 22, 25, 17, 12, 23,
14, 19, 30, 18, 10, 23, 21, 17, 16, 10, 14, 6, 17, 17, 10, 21, 25, 20,
4, 11, 4, 11, 12, 21, 13, 12, 21, 8, 20, 8, 13, 18, 24, 19, 23, 17, 17,
10, 12, 5, 16, 1, 8, 14, 39, 13, 6, 6, 23, 11, 23, 21, 16, 18, 12, 16,
6, 8, 5, 9, 10, 9, 11, 20, 8, 12, 0, 14, 24, 30, 17, 17, 18, 43, 14, 36,
15, 20, 10, 19, 14, 19, 28, 21, 18, 24, 16, 13, 16, 11, 24, 28, 14, 23,
11, 19, 25, 26, 34, 16, 9, 18, 13, 27, 14, 16, 21, 10, 18, 23, 16, 39,
16, 35, 24, 32, 23, 19, 28, 19, 21, 32, 16, 21, 54, 23, 42, 33, 27, 19,
26, 22, 22, 19, 19, 19, 45, 14, 41, 15, 18, 27, 25, 23, 67, 7, 12, 26,
25, 26, 22, 12, 125, 70, 48, 17, 20, 34, 15, 25, 45, 32, 42, 29, 107,
14, 17, 39, 34, 45, 65, 46, 27, 31, 28, 26, 27, 24, 39, 26, 16, 28, 28,
17, 32, 32, 35, 25, 44, 21, 20, 21, 32, 30, 29, 18, 86, 32, 56, 30, 31,
35, 17, 19, 27, 22, 23, 44, 15, 18, 34, 32, 21, 32, 22, 19, 34, 21, 8,
19, 32, 31, 35, 19, 16, 17, 36, 39, 29, 31, 42, 30, 42, 22, 25, 20, 52,
21, 36, 27, 13, 27, 118, 18, 12, 19, 16, 60, 22, 25, 22, 24, 11, 53, 1,
27, 34, 39, 43, 22, 21, 36, 43, 20, 49, 45, 34, 95, 10, 13, 27, 21, 20,
39, 26, 27, 23, 24, 20, 21, 14, 26, 31, 31, 23, 24, 19, 52, 50, 26, 16,
78, 29, 34, 15, 59, 20, 43, 10, 29, 41, 23, 26, 26, 27, 8, 35, 34, 16,
42, 31, 35, 36, 33, 43, 19, 24, 21, 24, 18, 32, 33, 35, 17, 13, 25, 34,
7, 54, 34, 92, 27, 24, 19, 13, 16, 22, 18, 15, 19, 17, 31, 14, 32),
p=c(0.0016, 0.002752, 0.001728, 0.0016, 0.001792, 0.005953, 0.006081,
0.001792, 0.002432, 0.0016, 0.001984, 0.002624, 0.001984, 0.002432,
0.001728, 0.002304, 0.002176, 0.003008, 0.001792, 0.0016, 0.0016,
0.001664, 0.0032, 0.003328, 0.0016, 0.002112, 0.001472, 0.001728,
0.001728, 0.002432, 0.001856, 0.0016, 0.001536, 0.002624, 0.003072,
0.001664, 0.002624, 0.003264, 0.004417, 0.002816, 0.001536, 0.00288,
0.002048, 0.002368, 0.002624, 0.002048, 0.003008, 0.002432, 0.003008,
0.003584, 0.001856, 0.00256, 0.001728, 0.001856, 0.002048, 0.00288,
0.001984, 0.0016, 0.001728, 0.002432, 0.001472, 0.001792, 0.003328,
0.003136, 0.00256, 0.002496, 0.00192, 0.002048, 0.00192, 0.001792,
0.002432, 0.0016, 0.001728, 0.002624, 0.0016, 0.001728, 0.001728,
0.001664, 0.002304, 0.001472, 0.002176, 0.0016, 0.000512, 0.001792,
0.001856, 0.002432, 0.0016, 0.001856, 0.001856, 0.001664, 0.001856,
0.00192, 0.00128, 0.001664, 0.00256, 0.001728, 0.001856, 0.002176,
0.001472, 0.001728, 0.001664, 0.001408, 0.001728, 0.002688, 0.00224,
0.004289, 0.001728, 0.001408, 0.001984, 0.001344, 0.001664, 0.001024,
0.001984, 0.001536, 0.0016, 0.008066, 0.001152, 0.001472, 0.001536,
0.002048, 0.001792, 0.003969, 0.002432, 0.00192, 0.003008, 0.003008,
0.00224, 0.001856, 0.001344, 0.0016, 0.00224, 0.002176, 0.001664,
0.0016, 0.001664, 0.001664, 0.001664, 0.00128, 0.002048, 0.002432,
0.002944, 0.001728, 0.00192, 0.001984, 0.003328, 0.001536, 0.002624,
0.001984, 0.001728, 0.002176, 0.00192, 0.001856, 0.001984, 0.00224,
0.001856, 0.0016, 0.001792, 0.001344, 0.001728, 0.001856, 0.001536,
0.001792, 0.00256, 0.002112, 0.002816, 0.001792, 0.001728, 0.001984,
0.002048, 0.002752, 0.001984, 0.001728, 0.001536, 0.002432, 0.00224,
0.001664, 0.001664, 0.001856, 0.0016, 0.002176, 0.001792, 0.001472,
0.003648, 0.00192, 0.003264, 0.001792, 0.003136, 0.00224, 0.002112,
0.002368, 0.001664, 0.001792, 0.003328, 0.002368, 0.001792, 0.007553,
0.002944, 0.004033, 0.002752, 0.002944, 0.001856, 0.002304, 0.00224,
0.001728, 0.0016, 0.001344, 0.002368, 0.003392, 0.001856, 0.003584,
0.001408, 0.001984, 0.003136, 0.00192, 0.00288, 0.005633, 0.001408,
0.001664, 0.00288, 0.0032, 0.002752, 0.002304, 0.001664, 0.00973,
0.005633, 0.003392, 0.001664, 0.001792, 0.002432, 0.002048, 0.001984,
0.003392, 0.002368, 0.004929, 0.002176, 0.012739, 0.001984, 0.001664,
0.00352, 0.004161, 0.004993, 0.004865, 0.004417, 0.002368, 0.0032,
0.001792, 0.002112, 0.00224, 0.0016, 0.003328, 0.002304, 0.001344,
0.001856, 0.003072, 0.001664, 0.002432, 0.002496, 0.002688, 0.002176,
0.003008, 0.001664, 0.001856, 0.0016, 0.002624, 0.002176, 0.002368,
0.001472, 0.008962, 0.002496, 0.003904, 0.002368, 0.002496, 0.002624,
0.001472, 0.002048, 0.002048, 0.001536, 0.002304, 0.003072, 0.001536,
0.0016, 0.002368, 0.002304, 0.00192, 0.002304, 0.0016, 0.001536,
0.003008, 0.0016, 0.001984, 0.002176, 0.004993, 0.002368, 0.002304,
0.001344, 0.002368, 0.001728, 0.003136, 0.004033, 0.00224, 0.002112,
0.002944, 0.001984, 0.003392, 0.002048, 0.001984, 0.001728, 0.003776,
0.001856, 0.002752, 0.002624, 0.0016, 0.002816, 0.012163, 0.00352,
0.001344, 0.001728, 0.001856, 0.005505, 0.001664, 0.001728, 0.0016,
0.001664, 0.002112, 0.00384, 0.001792, 0.00192, 0.002368, 0.002752,
0.004097, 0.0016, 0.001536, 0.00256, 0.003264, 0.002112, 0.004353,
0.003264, 0.00224, 0.00845, 0.001408, 0.001536, 

Re: [R] histogram method for S4 class.

2005-08-26 Thread Ernesto Jardim
Deepayan Sarkar wrote:

On 8/24/05, ernesto [EMAIL PROTECTED] wrote:
  

Hi,

I'm trying to develop an histogram method for a class called FLQuant
which is used by the package FLCore (http://flr-project.org). FLQuant is
an extension to array. There is an as.data.frame method that coerces
flquant into a data.frame suitable for lattice plotting. The problem is
that when I coerce the object and plot it after it works but if the
method is applied within the histogram method it does not work. See the
code below (the FLCore package is here
http://prdownloads.sourceforge.net/flr/FLCore_1.0-1.tar.gz?download)



library(FLCore)
  

Loading required package: lattice


data(ple4)
histogram(~data|year, [EMAIL PROTECTED])
  

Error in inherits(x, factor) : Object x not found


histogram(~data|year, data=as.data.frame([EMAIL PROTECTED]))
  

The catch.n slot is a FLQuant object and the code for histogram is the
following

setMethod(histogram, signature(formula=formula, data=FLQuant),
function (formula, data = parent.frame(), allow.multiple =
is.null(groups) || outer,
outer = FALSE, auto.key = FALSE, aspect = fill, panel =
panel.histogram, prepanel = NULL,
scales = list(), strip = TRUE, groups = NULL, xlab, xlim, ylab,
ylim,
type = c(percent, count, density),
nint = if (is.factor(x)) length(levels(x)) else
round(log2(length(x)) + 1),
endpoints = extend.limits(range(x[!is.na(x)]), prop = 0.04),
breaks = if (is.factor(x)) seq(0.5, length = length(levels(x)) +
1) else do.breaks(endpoints, nint),
equal.widths = TRUE, drop.unused.levels =
lattice.getOption(drop.unused.levels), ...,
default.scales = list(), subscripts = !is.null(groups), subset =
TRUE) {

qdf - as.data.frame(data)

histogram(formula, data = qdf, allow.multiple = allow.multiple,
outer = outer,
auto.key = auto.key, aspect = aspect, panel = panel,
prepanel = prepanel, scales = scales,
strip = strip, groups = groups, xlab=xlab, xlim=xlim,
ylab=ylab, ylim=ylim, type = type,
nint = nint, endpoints = endpoints, breaks = breaks,
equal.widths = equal.widths,
drop.unused.levels = drop.unused.levels, ..., default.scales
= default.scales,
subscripts = subscripts, subset = subset)
}
)


Any ideas ?



[I'm CC-ing to r-devel, please post follow-ups there]

What version of lattice are you using? Please use the latest one, in
which histogram is an S3 generic, with only one argument, formula. The
eventual solution to your problem may involve changing that, but the
first question to ask is whether any other formula makes sense in your
context (if not, I would rather keep one argument and dispatch on
signature(formula = FLQuant).

Disclaimer: I haven't actually had time to check out FLCore yet, I
will as soon as I can.

Deepayan
  

Hi,

I've installed the version that is distributed with R-2.1.1, 0.11-8. I 
see there's a new version now so I'll install it and check the results. 
I've developed the code a little more using the approach you use for 
dotplot (see below) and I know where the problem is now. I'm not able to 
pass the argument nint, breaks and endpoints to the function call. I 
guess the problem is my programming skils :-(

Thanks

EJ

ps: I'm not a subscriber of r-devel so I guess I'm not able to post 
there, anyway I'm CC-ing there too.



setMethod(histogram, signature(formula=formula, data=FLQuant), 
function (formula, data = parent.frame(), allow.multiple = 
is.null(groups) || outer, outer = FALSE, auto.key = FALSE, aspect = 
fill, panel = panel.histogram, prepanel = NULL, scales = list(), 
strip = TRUE, groups = NULL, xlab, xlim, ylab, ylim, type = c(percent, 
count, density), nint = if (is.factor(x)) length(levels(x)) else 
round(log2(length(x)) + 1), endpoints = 
extend.limits(range(x[!is.na(x)]), prop = 0.04), breaks = if 
(is.factor(x)) seq(0.5, length = length(levels(x)) + 1) else 
do.breaks(endpoints, nint), equal.widths = TRUE, drop.unused.levels = 
lattice.getOption(drop.unused.levels), ..., default.scales = list(), 
subscripts = !is.null(groups), subset = TRUE) {

# need to develop further, at the moment is not possible to control 
nint, breaks and endpoints.

data - as.data.frame(data)

dots - list(...)

groups - eval(substitute(groups), data, parent.frame())
subset - eval(substitute(subset), data, parent.frame())

call.list - c(list(formula = formula, data = data, groups = groups, 
subset = subset, allow.multiple = allow.multiple, outer = outer, 
auto.key = auto.key, aspect = aspect, panel = panel, prepanel = 
prepanel, scales = scales, strip = strip, type = type, equal.widths = 
equal.widths, drop.unused.levels = drop.unused.levels, default.scales = 
default.scales, subscripts = subscripts), dots)

# include xlab  co if existent
if(!missing(xlab)) call.list$xlab - xlab
if(!missing(ylab)) call.list$ylab - ylab
if(!missing(xlim)) call.list$xlim - xlim
if(!missing(ylim)) 

[R] Cumulative Function

2005-08-26 Thread Mark Miller
I was wandering if anyone cold advise me on a good algorithm to turn a set of 
data from its original for into its cumulative form.  I have written a piece 
of code that takes the data and does essentially what a histogram function 
would do, except add to the new bin the sum in the previous bin.  Once that 
is done I divide by the frequency in the last bin plus 1.  I know the ecdf() 
function exists in R, but I want to use it to fit the cumulative 
distributions to the data and ecdf() produces a non-subscriptable vector and 
so fitdistr() cannot be used on it.

Thanks for any help you can give
Mark Miller

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] chisq.,test`

2005-08-26 Thread Ted Harding
On 26-Aug-05 Stephen Choularton wrote:
 Hi
  
  I am trying to do this:
  
 chisq.test(c(11, 13, 12, 18, 21, 43, 15, 12, 9, 10, 5, 28, 22, 11, 15,
 [...]
 7, 54, 34, 92, 27, 24, 19, 13, 16, 22, 18, 15, 19, 17, 31, 14, 32),
 p=c(0.0016, 0.002752, 0.001728, 0.0016, 0.001792, 0.005953, 0.006081,
 [...]
 0.001664, 0.002304))
  
 but I keep on getting ‘syntax error’.  Is this to long, or somehow
 badly formed?

Having cutpasted your data as published in your email into R-1.8.0
on Red Hat Linux 9, I encountered no syntax error and got the
result:

  data:  c(11, 13, 12, 18, 21, 43, 15, 12, 9, 10, 5, 28, 22, 11, 15,
  [...]

  X-squared = 1094.448, df = 414, p-value =  2.2e-16

Fee: For each P-value of the form x.ye-N, N beers.

Hoping this helps ... though your syntax error remains mysterious.

Best wishes,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 26-Aug-05   Time: 09:53:38
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Cumulative Function

2005-08-26 Thread Sean Davis
Will ?cumsum help?

Sean

On 8/26/05 5:15 AM, Mark Miller [EMAIL PROTECTED] wrote:

 I was wandering if anyone cold advise me on a good algorithm to turn a set of
 data from its original for into its cumulative form.  I have written a piece
 of code that takes the data and does essentially what a histogram function
 would do, except add to the new bin the sum in the previous bin.  Once that
 is done I divide by the frequency in the last bin plus 1.  I know the ecdf()
 function exists in R, but I want to use it to fit the cumulative
 distributions to the data and ecdf() produces a non-subscriptable vector and
 so fitdistr() cannot be used on it.
 
 Thanks for any help you can give
 Mark Miller
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] learning decision trees with one's own scoring functins

2005-08-26 Thread Prof Brian Ripley
Please do study the packages you mention a great deal more carefully 
before posting such negative remarks about them.

In particular, rpart is already fully user-extensible (and comes with a 
worked example), and both packages are supplied in source code on CRAN.

On Fri, 26 Aug 2005, zhihua li wrote:

 Hi netters,

 I want to learn a decision tree from a series of instances (learning data). 
 The packages
 tree or rpart can do this quite well, but the scoring functions (splitting 
 criteria) are
 fixed in these packages, like gini or something. However, I'm going to use 
 another scoring
 function. 
 At first I wanna modify the R code of tree or rpart and put my own scoring 
 function in. But it seems that tree and rpart perform the splitting procedure 
 by calling external C functions, which I have no access to. So do I have to 
 write R code from scratch to build the tree with my own scoring functions? 
 It's a really tough task. Or r there other R packages that can do similar 
 things with more flexible and extensible code?

 Thanks a lot!



-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Help: lda predict

2005-08-26 Thread Shengzhe Wu
Hello,

I use lda (package: MASS) to obtain a lda object, then want to employ
this object to do the prediction for the new data like below:

predict(object, newdata, dimen=1, method=c(plug-in, predictive, debiased))

What is the exact difference among the three methods? What is the
difference of prediction results when applying different method?

Thank you,
Shengzhe

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Modelling Financial Time Series with S-PLUS - Adv. Course 20th Sept '05

2005-08-26 Thread Kathy Kiely
Insightful are now taking bookings for the Advanced Time Series Modelling
course to be held at Carlton Terrace in London SW1 on  20th September. 

 

Advanced workshop 

Extract for Financial Time Series Modelling : The Advanced Time Series Course
focuses on the most up to date theory and its application around the
following topics (note that not all topics will be covered during the
workshop)

1.  The Econometrics of transaction level data 

oAutoregressive Conditional Duration Models 

oMeasuring liquidity and market pressure 

oAssessing predictability in tick by tick data- market depth and
liquidity 

2.  Extracting the Implied Probability Distributions from Option Prices 

oCan options help predict the future spot market? 

3.  The Use of State Space models in Finance 

oMeasure herding and market sentiment 

4.  Factor Models of Asset Returns 

oStatistical and Economic Factors 

5.  GMM and EMM Estimation methods in SPLUS 

oContinuous time models in Finance 

oEstimating Stochastic Volatility Models 

6.  Copula Methods in Finance 

oBasic Copula testing and estimation 

oMeasuring joint risks 

oPricing Basket options 

Tutor: Mark Salmon, Professor of Finance, Warwick University

 

Mark Salmon has many years' experience teaching and research in Financial
econometrics; behavioural finance; asset pricing; knightian uncertainty; risk
management and asset management. Formerly Deutsche Morgan Grenfell Professor
of Financial Markets, Cass Business School, London, Mark is also an advisor
to the Bank of England and has many links with City Institutions. 

 

For full course details please go to: 

 

http://www.insightful.com/services/training/timeseries_UK05.asp 

 

To Register:

 

http://www.insightful.com/uk/services/register_uk.asp 

 

 

If you need any further information please do not hesitate to contact me.

 

Regards

 

Kathy Kiely,

 

 

 

Kathy Kiely

Sales  Marketing Assistant

Insightful Limited

Network House, Basing View Basingstoke, Hampshire, RG21 4HG

Tel  : 01256 339822

Fax : 01256 339839

e mail : [EMAIL PROTECTED] 

 

===

 

Beyond Excel(r): Quantitative Data Analysis with Insightful 


Date: Wednesday, August 31st at  4:00 PM  BST 

Speakers: Will Alexander, Wachovia Securities  David Smith, Insightful
Corporation


http://www.insightful.com/news_events/webcasts/2005/08excel/default.asp

 

 

 

 

 

 

 

 

 

 




 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help: lda predict

2005-08-26 Thread Prof Brian Ripley
On Fri, 26 Aug 2005, Shengzhe Wu wrote:

 I use lda (package: MASS) to obtain a lda object, then want to employ
 this object to do the prediction for the new data like below:

 predict(object, newdata, dimen=1, method=c(plug-in, predictive, 
 debiased))

That is not how you call it: when a character vector is given like that
those are alternatives.  Do read the help page, as we ask.

 What is the exact difference among the three methods? What is the
 difference of prediction results when applying different method?

This is stated on the help page.  If you are unfamiliar with the area, 
note that the posting guide points out that MASS is support software for a 
book and the explanations are in the book.  The help page also has 
references: please do read them (before posting).

 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Matrix oriented computing

2005-08-26 Thread Sigbert Klinke
Hi,

I want to compute the quantiles of Chi^2 distributions with different 
degrees of freedom like

x-cbind(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995)
df-rbind(1:100)
m-qchisq(x,df)

and hoped to get back  a  length(df) times length(x)  matrix with the 
quantiles. Since this does not work, I use

x-c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995)
df-c(1:100)
m-qchisq(x,df[1])
for (i in 2:length(df)) {
  m-rbind(m,qchisq(x,df[i]))
}
dim(m)-c(length(df),length(x))

Is there a way to avoid the for loop ?

Thanks Sigbert

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Help: lda predict

2005-08-26 Thread Shengzhe Wu
Thanks for your reply. Actually I called function as below.

p1 = predict(object, newdata, dimen=1)
p2 = predict(object, newdata, dimen=1, method=debiased)
p3 = predict(object, newdata, dimen=1, method=predictive)

The MAP classification of prediction results by any method are the
same. I know what the method plug-in and debiased mean, but what
does the vague prior for the method predictive mean? what is
vague here?

Thank you,
Shengzhe



On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote:
 On Fri, 26 Aug 2005, Shengzhe Wu wrote:
 
  I use lda (package: MASS) to obtain a lda object, then want to employ
  this object to do the prediction for the new data like below:
 
  predict(object, newdata, dimen=1, method=c(plug-in, predictive, 
  debiased))
 
 That is not how you call it: when a character vector is given like that
 those are alternatives.  Do read the help page, as we ask.
 
  What is the exact difference among the three methods? What is the
  difference of prediction results when applying different method?
 
 This is stated on the help page.  If you are unfamiliar with the area,
 note that the posting guide points out that MASS is support software for a
 book and the explanations are in the book.  The help page also has
 references: please do read them (before posting).
 
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 
 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Achim Zeileis
On Fri, 26 Aug 2005 14:44:10 +0200 Sigbert Klinke wrote:

 Hi,
 
 I want to compute the quantiles of Chi^2 distributions with different 
 degrees of freedom like
 
 x-cbind(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99,
 0.995) df-rbind(1:100)
 m-qchisq(x,df)
 
 and hoped to get back  a  length(df) times length(x)  matrix with the 
 quantiles. Since this does not work, I use
 
 x-c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99,
 0.995) df-c(1:100)
 m-qchisq(x,df[1])
 for (i in 2:length(df)) {
   m-rbind(m,qchisq(x,df[i]))
 }
 dim(m)-c(length(df),length(x))
 
 Is there a way to avoid the for loop ?

You could use sapply():
  m - sapply(x, function(x) qchisq(x, df))
hth,
Z

 Thanks Sigbert
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Marc Schwartz
On Fri, 2005-08-26 at 14:44 +0200, Sigbert Klinke wrote:
 Hi,
 
 I want to compute the quantiles of Chi^2 distributions with different 
 degrees of freedom like
 
 x-cbind(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995)
 df-rbind(1:100)
 m-qchisq(x,df)
 
 and hoped to get back  a  length(df) times length(x)  matrix with the 
 quantiles. Since this does not work, I use
 
 x-c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995)
 df-c(1:100)
 m-qchisq(x,df[1])
 for (i in 2:length(df)) {
   m-rbind(m,qchisq(x,df[i]))
 }
 dim(m)-c(length(df),length(x))
 
 Is there a way to avoid the for loop ?
 
 Thanks Sigbert

See ?sapply

x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 
   0.95, 0.975, 0.99, 0.995)

df - c(1:100)

mat - sapply(x, qchisq, df)

 dim(mat)
[1] 100  11
 
 str(mat)
 num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ...


HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help: lda predict

2005-08-26 Thread Prof Brian Ripley
On Fri, 26 Aug 2005, Shengzhe Wu wrote:

 Thanks for your reply. Actually I called function as below.

 p1 = predict(object, newdata, dimen=1)
 p2 = predict(object, newdata, dimen=1, method=debiased)
 p3 = predict(object, newdata, dimen=1, method=predictive)

So why did you say something different?

 The MAP classification of prediction results by any method are the
 same. I know what the method plug-in and debiased mean, but what
 does the vague prior for the method predictive mean? what is
 vague here?

Please do as we ask, and read the book for which this is supporting 
material (on p.339, to save you looking in the index).


 Thank you,
 Shengzhe



 On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote:
 On Fri, 26 Aug 2005, Shengzhe Wu wrote:

 I use lda (package: MASS) to obtain a lda object, then want to employ
 this object to do the prediction for the new data like below:

 predict(object, newdata, dimen=1, method=c(plug-in, predictive, 
 debiased))

 That is not how you call it: when a character vector is given like that
 those are alternatives.  Do read the help page, as we ask.

 What is the exact difference among the three methods? What is the
 difference of prediction results when applying different method?

 This is stated on the help page.  If you are unfamiliar with the area,
 note that the posting guide points out that MASS is support software for a
 book and the explanations are in the book.  The help page also has
 references: please do read them (before posting).

 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595




-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Patrick Burns
I believe that the following is what you want:

x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995)
dof - 1:100
ans - outer(x, dof, qchisq)
dimnames(ans) - list(x, dof)


Note that 'df' is not a very auspicious name for an object since
it is the name of a function. 

Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Sigbert Klinke wrote:

Hi,

I want to compute the quantiles of Chi^2 distributions with different 
degrees of freedom like

x-cbind(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995)
df-rbind(1:100)
m-qchisq(x,df)

and hoped to get back  a  length(df) times length(x)  matrix with the 
quantiles. Since this does not work, I use

x-c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995)
df-c(1:100)
m-qchisq(x,df[1])
for (i in 2:length(df)) {
  m-rbind(m,qchisq(x,df[i]))
}
dim(m)-c(length(df),length(x))

Is there a way to avoid the for loop ?

Thanks Sigbert

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help: lda predict

2005-08-26 Thread Shengzhe Wu
I compared posterior of these three prediction results, they are a
little different.

The book you mentioned should be Modern Applied Statistics with S.
4th edition. But this book has been borrowed out from our univeristy
library by someone else, and I have checked the book Pattern
Recognition and Neural Networks which does not mention these three
lda prediction methods.

Thanks you,
Shengzhe

On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote:
 On Fri, 26 Aug 2005, Shengzhe Wu wrote:
 
  Thanks for your reply. Actually I called function as below.
 
  p1 = predict(object, newdata, dimen=1)
  p2 = predict(object, newdata, dimen=1, method=debiased)
  p3 = predict(object, newdata, dimen=1, method=predictive)
 
 So why did you say something different?
 
  The MAP classification of prediction results by any method are the
  same. I know what the method plug-in and debiased mean, but what
  does the vague prior for the method predictive mean? what is
  vague here?
 
 Please do as we ask, and read the book for which this is supporting
 material (on p.339, to save you looking in the index).
 
 
  Thank you,
  Shengzhe
 
 
 
  On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote:
  On Fri, 26 Aug 2005, Shengzhe Wu wrote:
 
  I use lda (package: MASS) to obtain a lda object, then want to employ
  this object to do the prediction for the new data like below:
 
  predict(object, newdata, dimen=1, method=c(plug-in, predictive, 
  debiased))
 
  That is not how you call it: when a character vector is given like that
  those are alternatives.  Do read the help page, as we ask.
 
  What is the exact difference among the three methods? What is the
  difference of prediction results when applying different method?
 
  This is stated on the help page.  If you are unfamiliar with the area,
  note that the posting guide points out that MASS is support software for a
  book and the explanations are in the book.  The help page also has
  references: please do read them (before posting).
 
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 
  --
  Brian D. Ripley,  [EMAIL PROTECTED]
  Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
  University of Oxford, Tel:  +44 1865 272861 (self)
  1 South Parks Road, +44 1865 272866 (PA)
  Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 
 
 
 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help: lda predict

2005-08-26 Thread Prof Brian Ripley
On Fri, 26 Aug 2005, Shengzhe Wu wrote:

 I compared posterior of these three prediction results, they are a
 little different.

 The book you mentioned should be Modern Applied Statistics with S.
 4th edition. But this book has been borrowed out from our univeristy
 library by someone else,

So please do request it and consult it, as you are using its support 
software.  Note that the posting guide does asks you to mention if you 
have no access to the references.

  and I have checked the book Pattern
 Recognition and Neural Networks which does not mention these three
 lda prediction methods.

It does, in detail, in sections 2.4 and 2.5.


 Thanks you,
 Shengzhe

 On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote:
 On Fri, 26 Aug 2005, Shengzhe Wu wrote:

 Thanks for your reply. Actually I called function as below.

 p1 = predict(object, newdata, dimen=1)
 p2 = predict(object, newdata, dimen=1, method=debiased)
 p3 = predict(object, newdata, dimen=1, method=predictive)

 So why did you say something different?

 The MAP classification of prediction results by any method are the
 same. I know what the method plug-in and debiased mean, but what
 does the vague prior for the method predictive mean? what is
 vague here?

 Please do as we ask, and read the book for which this is supporting
 material (on p.339, to save you looking in the index).


 Thank you,
 Shengzhe



 On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote:
 On Fri, 26 Aug 2005, Shengzhe Wu wrote:

 I use lda (package: MASS) to obtain a lda object, then want to employ
 this object to do the prediction for the new data like below:

 predict(object, newdata, dimen=1, method=c(plug-in, predictive, 
 debiased))

 That is not how you call it: when a character vector is given like that
 those are alternatives.  Do read the help page, as we ask.

 What is the exact difference among the three methods? What is the
 difference of prediction results when applying different method?

 This is stated on the help page.  If you are unfamiliar with the area,
 note that the posting guide points out that MASS is support software for a
 book and the explanations are in the book.  The help page also has
 references: please do read them (before posting).

 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595




 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595




-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] covariance matrix under null

2005-08-26 Thread Thomas Lumley
On Thu, 25 Aug 2005, Devarajan, Karthik wrote:


 Hello

 I am fitting a Cox PH model using the function coxph(). Does anyone know how
 to obtain the estimate of the covariance matrix under the null hypothesis.
 The function coxph.detail() does not seem to be useful for this purpose.


You can evaluate the second derivative of the partial loglikelihood at any 
specified beta with

   vcov(coxph(formula, data,iter=0, start, init=beta)

eg if you want to get score tests.

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R-help Digest, Vol 30, Issue 26

2005-08-26 Thread Jean-Marc Ottorini
Dear R helpers,

   For me ( i.e. R 2.1.1 on Mac OS X), using  trellis.device 
(postscript, onefile = F, etc ...  with the lattice library within a R 
function works fine to obtain the desired graph as an EPS file , 
provided that :

1) the command dev.off() is not included in this function

2) and it is  issued at the  command level after the function has 
been exited

I would like to know if there is a way to close the EPS file within the 
function itself, freeing the user to issue the closing command (I 
already  tried trellis.device (), and trellis.device (null) without any 
success).

Regards,

J.-M.

  
Jean-Marc Ottorini   LERFoB, UMR INRA-ENGREF 1092
  email  [EMAIL PROTECTED]  INRA - Centre de Nancy
  voice  +33-0383-394046F54280 - Champenoux
  fax+33-0383-394034 France

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Peter Dalgaard
Marc Schwartz [EMAIL PROTECTED] writes:

 x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 
0.95, 0.975, 0.99, 0.995)
 
 df - c(1:100)
 
 mat - sapply(x, qchisq, df)

  dim(mat)
 [1] 100  11
  
  str(mat)
  num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ...

outer() is perhaps a more natural first try... It does give the
transpose of the sapply approach, though. 

round(t(outer(x,df,qchisq)),2)

should be close. You should likely add dimnames.


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Shutting down a trellis plot (was R-help Digest, Vol 30, Issue 26)

2005-08-26 Thread Prof Brian Ripley
I suspect you have not print()-ed your graphics, see FAQ Q7.22.

It is then possible to include dev.off() within the function.  E.g.

testit - function(fn = test.eps)
{
   trellis.device(postscript, file=fn, onefile = FALSE, horizontal=FALSE)
   print(stripplot(voice.part ~ jitter(height), data = singer, aspect = 1,
   jitter = TRUE, xlab = Height (inches)))
   dev.off()
}

testit()

works for me.


On Fri, 26 Aug 2005, Jean-Marc Ottorini wrote:

   For me ( i.e. R 2.1.1 on Mac OS X), using  trellis.device
 (postscript, onefile = F, etc ...  with the lattice library within a R
 function works fine to obtain the desired graph as an EPS file ,
 provided that :

1) the command dev.off() is not included in this function

2) and it is  issued at the  command level after the function has
 been exited

 I would like to know if there is a way to close the EPS file within the
 function itself, freeing the user to issue the closing command (I
 already  tried trellis.device (), and trellis.device (null) without any
 success).


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] passing arguments from nnet to optim

2005-08-26 Thread Tarca, Adi

Hi everyone,
According to R reference manual, the nnet function uses the BFGS method
of optim to optimize the neural network parameters.
I would like, when calling the function nnet to tell the optim function
not to produce the tracing information on the progress of the
optimization, or at least to reduce the frequency of the reports.
I tried the following:
a) nnet default
 x-rnorm(20)
 y-seq(0,1,length=20)
 s-nnet(y~x,size=1)
# weights:  4
initial  value 1.910932 
iter  10 value 1.819382
iter  20 value 1.788736
iter  30 value 1.775778
iter  40 value 1.767771
iter  50 value 1.765063
iter  60 value 1.762631
iter  70 value 1.760670
iter  80 value 1.759349
iter  90 value 1.757801
iter 100 value 1.756290
final  value 1.756290 
stopped after 100 iterations

Report is generated at every 10 iterations.

b) passing the REPORT parameter to optim via the control argument
 x-rnorm(20)
 y-seq(0,1,length=20)
 s-nnet(y~x,size=1,control=list(REPORT=50))
# weights:  4
initial  value 1.894905 
iter  10 value 1.672337
iter  20 value 1.658612
iter  30 value 1.654824
iter  40 value 1.653465
iter  50 value 1.652785
iter  60 value 1.652343
iter  70 value 1.652116
iter  80 value 1.651860
iter  90 value 1.651525
iter 100 value 1.651292
final  value 1.651292 
stopped after 100 iterations

Is still producing reports at each 10 iterations. 
Has anyone an idea how can I turn off the report generation or at least
to reduce its frequency?
Thanks,
Adi L. TARCA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Plotting nls

2005-08-26 Thread Ben Rich
To get nice looking plots you can use trellis plots from the lattice
package.  First you need:

library(lattice)

Then you can define a custom panel function that will overlay the
fitted curve on top of the data points in a different color (you just
need to do this once; the fit you want plotted is specified as an
argument):

pred.overlay.panel - function(x, y, fit, ...)
{
   panel.grid()
   panel.xyplot(x, y, ...)
   form - as.list(sys.call(-2))[[2]]$call$formula
   resp - deparse(form[[2]])
   covar - deparse(form[[3]])
   xx - seq(min(x), max(x), len=101)
   newdat - data.frame(xx)
   colnames(newdat) - covar
   panel.superpose(xx, predict(fit, newdata=newdat),
   subscripts=1:length(xx), groups=factor(rep(2, length(xx)),
   levels=1:2), type=l, ...)
}

Finally, you use the custom panel function in a call to xyplot:

xyplot(y ~ x, data=sample, panel=pred.overlay.panel, fit=fit,
scales=list(x=list(log=TRUE)))


Note how you specify that you want the x-axis to be in log-scale with
the scales parameter.

Hope this helps.

Ben

On 8/26/05, Lanre Okusanya [EMAIL PROTECTED] wrote:
 Kindly excuse a non-statistician newbie attempting to wrestle with R.
 
 This might be a relatively easy question, but I am trying to perform nls
 regression and plot the fitted function through the data superimposed on
 the raw data. from reading the R-help, Rtips et al, I am only able to do
 that by extracting the parameter values manually and using it to create
 the plot.
 
 Is there an easier way to do this,  (I have ~60 Plots), obtain an r^2,
 and also plot the x axis in the log domain (any attempts I have tried
 have screwed up).
 
 NLS script
 
 fit- nls(y~-emax*x^h/(ec50^h+x^h),
data= sample, start=list(emax=4,h=2,ec50=1))
 
 summary(fit)
 
 Thank you all for your help
 
 Lanre Okusanya, Pharm.D.,BCPS
 UB/Pfizer Pharmacometrics Fellow
 University at Buffalo School of Pharmacy and Pharmaceutical Sciences
 237 Cooke Hall
 Buffalo, NY 14260
 Email: [EMAIL PROTECTED]
 Tel: (716)645-2828 x 275
 Fax: (716)645-2886
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] basic anova and t-test question

2005-08-26 Thread Arne.Muller
Hello,

I'm posting this to receive some comments/hints about a rather statistical than 
R-technical question ... .

In an anova of a lme factor SSPos11 shows up non-significant, but in the t-test 
of the summay 2 of the 4 levels (one for constrast) are significant. See below 
for some truncated output.

I realize that the two test are different (F-test/t-test), but I'm looking for 
for a meaning. Maye you have a schenario that explains how these differences 
can be created and how you'd go ahead and analyse it further.

When I use SSPos11 as te only fixed effect, it does it is not significant in 
either anova nor t-test, and a boxplot of the factor shows that the levels are 
all quite similar (similar variance and mean). Might the effect I observe be 
linked to an unbalance design in the multifactorial model?

thanks a lot for your help,
+kind regards,

Arne

 anova(fit)
 numDF denDF  F-value p-value
(Intercept)  1   540 323.4442  .0001
SSPos1   3   540  15.1206  .0001
...
SSPos11  3   540   1.1902  0.3128
...

 summary(fit)
Linear mixed-effects model fit by REML
 Data: d.orig 
   AIC  BIClogLik
  1007.066 1153.168 -469.5329

Random effects:
 Formula: ~1 | Method
(Intercept)  Residual
StdDev:   0.4000478 0.4943817

Fixed effects: log(value + 7.5) ~ SSPos1 + SSPos2 + SSPos6 + SSPos7 + SSPos10 + 
SSPos11 + SSPos13 + SSPos14 + SSPos18 + SSPos19 +  
  Value  Std.Error  DF   t-value p-value
(Intercept)   2.8621811 0.23125065 540 12.376964  0.
SSPos1C  -0.1647937 0.06293993 540 -2.618269  0.0091
SSPos1G  -0.3448095 0.05922479 540 -5.822047  0.
SSPos1T   0.1083988 0.06087095 540  1.780797  0.0755
...
SSPos11C -0.1540292 0.06171635 540 -2.495761  0.0129
SSPos11G -0.1428980 0.05993122 540 -2.384368  0.0175
SSPos11T -0.0039434 0.06133920 540 -0.064289  0.9488
...


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Marc Schwartz (via MN)
On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote:
 Marc Schwartz [EMAIL PROTECTED] writes:
 
  x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 
 0.95, 0.975, 0.99, 0.995)
  
  df - c(1:100)
  
  mat - sapply(x, qchisq, df)
 
   dim(mat)
  [1] 100  11
   
   str(mat)
   num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ...
 
 outer() is perhaps a more natural first try... It does give the
 transpose of the sapply approach, though. 
 
 round(t(outer(x,df,qchisq)),2)
 
 should be close. You should likely add dimnames.



What I find interesting, is that I would have intuitively expected
outer() to be faster than sapply().  However:


  system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
[1] 0.01 0.00 0.01 0.00 0.00
 
  system.time(mat1 - round(t(outer(x, df, qchisq)), 2), 
   gcFirst = TRUE)
[1] 0.01 0.00 0.01 0.00 0.00
 
# No round() or t() to test for overhead
  system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
[1] 0.01 0.00 0.02 0.00 0.00


# Bear in mind the round() on mat1 above
 all.equal(mat, mat1)
[1] Mean relative  difference: 4.905485e-05

 all.equal(mat, t(mat2))
[1] TRUE


Even when increasing the size of 'df' to 1:1000:


  system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
[1] 0.16 0.01 0.16 0.00 0.00

  system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst =
TRUE)
[1] 0.16 0.00 0.18 0.00 0.00
 
  # No round() or t() to test for overhead
  system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
[1] 0.16 0.01 0.17 0.00 0.00



It also seems that, at least in this case, t() and round() do not add
much overhead.

Best regards,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] passing arguments from nnet to optim

2005-08-26 Thread Prof Brian Ripley
On Fri, 26 Aug 2005, Tarca, Adi wrote:


 Hi everyone,
 According to R reference manual, the nnet function uses the BFGS method
 of optim to optimize the neural network parameters.

What the help page says is

  ...: arguments passed to or from other methods.

That means methods of nnet().

  Optimization is done via the BFGS method of 'optim'.

but it is not calling optim, rather the C code implementing optim.

 I would like, when calling the function nnet to tell the optim function
 not to produce the tracing information on the progress of the
 optimization, or at least to reduce the frequency of the reports.
 I tried the following:
 a) nnet default
 x-rnorm(20)
 y-seq(0,1,length=20)
 s-nnet(y~x,size=1)
 # weights:  4
 initial  value 1.910932
 iter  10 value 1.819382
 iter  20 value 1.788736
 iter  30 value 1.775778
 iter  40 value 1.767771
 iter  50 value 1.765063
 iter  60 value 1.762631
 iter  70 value 1.760670
 iter  80 value 1.759349
 iter  90 value 1.757801
 iter 100 value 1.756290
 final  value 1.756290
 stopped after 100 iterations

 Report is generated at every 10 iterations.

 b) passing the REPORT parameter to optim via the control argument
 x-rnorm(20)
 y-seq(0,1,length=20)
 s-nnet(y~x,size=1,control=list(REPORT=50))
 # weights:  4
 initial  value 1.894905
 iter  10 value 1.672337
 iter  20 value 1.658612
 iter  30 value 1.654824
 iter  40 value 1.653465
 iter  50 value 1.652785
 iter  60 value 1.652343
 iter  70 value 1.652116
 iter  80 value 1.651860
 iter  90 value 1.651525
 iter 100 value 1.651292
 final  value 1.651292
 stopped after 100 iterations

 Is still producing reports at each 10 iterations.
 Has anyone an idea how can I turn off the report generation or at least
 to reduce its frequency?

You do it via the C code.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] problem with certain data sets when using randomForest

2005-08-26 Thread Martin Lam
Hi,

Since I've had no replies on my previous post about my
problem I am posting it again in the hope someone
notice it. The problem is that the randomForest
function doesn't take datasets which has instances
only containing a subset of  all the classes. So the
dataset with instances that either belong to class a
or b from the levels a, b and c doesn't work
because there is no instance that has class c. Is
there any way to solve this problem?

library(randomForest)

# load the iris plant data set
dataset - iris

numberarray - array(1:nrow(dataset), nrow(dataset),
1)

# include only instances with Species = setosa or
virginica
indices - t(numberarray[(dataset$Species == setosa
| 
dataset$Species == virginica) == TRUE])

finaldataset - dataset[indices,]

# just to let you see the 3 classes
levels(finaldataset$Species)

# create the random forest
randomForest(formula = Species ~ ., data =
finaldataset, ntree = 5)

# The error message I get
Error in randomForest.default(m, y, ...) : 
Can't have empty classes in y.

#The problem is that the finaldataset doesn't contain
#any instances of versicolor, so I think the only
way #to solve this problem is by changing the levels
the #Species have to only setosa and virginica,
# correct me if I'm wrong.

# So I tried to change the levels but I got stuck:

# get the possible unique classes
uniqueItems - unique(levels(finaldataset$Species))

# the problem!
newlevels - list(uniqueItems[1] = c(uniqueItems[1],
uniqueItems[2]), uniqueItems[3] = uniqueItems[3])

# Error message
Error: syntax error

# In the help they use constant names to rename the
#levels, so this works (but that's not what I want
#because I don't want to change the code every time I
#use another data set):
newlevels - list(setosa = c(uniqueItems[1],
uniqueItems[2]), virginica = uniqueItems[3])

levels(finaldataset$Species) - newlevels

levels(finaldataset$Species)

finaldataset$Species

---

Thanks in advance,

Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Prof Brian Ripley
Try profiling.  Doing this many times to get an overview, e.g. for sapply 
with df=1:1000:

%   self%   total
  self secondstotalsecondsname
  98.26  6.78 98.26  6.78 FUN
   0.58  0.04  0.58  0.04 unlist
   0.29  0.02  0.87  0.06 as.vector
   0.29  0.02  0.58  0.04 names-
   0.29  0.02  0.29  0.02 names-.default
   0.29  0.02  0.29  0.02 names

so almost all the time is in qchisq.


On Fri, 26 Aug 2005, Marc Schwartz (via MN) wrote:

 On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote:
 Marc Schwartz [EMAIL PROTECTED] writes:

 x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9,
0.95, 0.975, 0.99, 0.995)

 df - c(1:100)

 mat - sapply(x, qchisq, df)

 dim(mat)
 [1] 100  11

 str(mat)
  num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ...

 outer() is perhaps a more natural first try... It does give the
 transpose of the sapply approach, though.

 round(t(outer(x,df,qchisq)),2)

 should be close. You should likely add dimnames.



 What I find interesting, is that I would have intuitively expected
 outer() to be faster than sapply().  However:


  system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
 [1] 0.01 0.00 0.01 0.00 0.00

  system.time(mat1 - round(t(outer(x, df, qchisq)), 2),
   gcFirst = TRUE)
 [1] 0.01 0.00 0.01 0.00 0.00

 # No round() or t() to test for overhead
  system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
 [1] 0.01 0.00 0.02 0.00 0.00


 # Bear in mind the round() on mat1 above
 all.equal(mat, mat1)
 [1] Mean relative  difference: 4.905485e-05

 all.equal(mat, t(mat2))
 [1] TRUE


 Even when increasing the size of 'df' to 1:1000:


  system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
 [1] 0.16 0.01 0.16 0.00 0.00

  system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst =
 TRUE)
 [1] 0.16 0.00 0.18 0.00 0.00

  # No round() or t() to test for overhead
  system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
 [1] 0.16 0.01 0.17 0.00 0.00



 It also seems that, at least in this case, t() and round() do not add
 much overhead.

Definitely not for such small matrices.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread John Fox
Dear Mark,

For that matter, the loop isn't a whole a slower (on my 3GHz Win XP system):

 x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995)
 df - 1:1000
 system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
[1] 0.08 0.00 0.08   NA   NA
 
 mat - matrix(0, 1000, 11)
 system.time(for (i in 1:length(df)) mat[i,] - qchisq(x, df[i]))
[1] 0.09 0.00 0.10   NA   NA
 

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Marc 
 Schwartz (via MN)
 Sent: Friday, August 26, 2005 10:26 AM
 To: Peter Dalgaard
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Matrix oriented computing
 
 On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote:
  Marc Schwartz [EMAIL PROTECTED] writes:
  
   x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 
  0.95, 0.975, 0.99, 0.995)
   
   df - c(1:100)
   
   mat - sapply(x, qchisq, df)
  
dim(mat)
   [1] 100  11

str(mat)
num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 
 4.12e-01 ...
  
  outer() is perhaps a more natural first try... It does give the 
  transpose of the sapply approach, though.
  
  round(t(outer(x,df,qchisq)),2)
  
  should be close. You should likely add dimnames.
 
 
 
 What I find interesting, is that I would have intuitively expected
 outer() to be faster than sapply().  However:
 
 
   system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
 [1] 0.01 0.00 0.01 0.00 0.00
  
   system.time(mat1 - round(t(outer(x, df, qchisq)), 2),
gcFirst = TRUE)
 [1] 0.01 0.00 0.01 0.00 0.00
  
 # No round() or t() to test for overhead
   system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
 [1] 0.01 0.00 0.02 0.00 0.00
 
 
 # Bear in mind the round() on mat1 above
  all.equal(mat, mat1)
 [1] Mean relative  difference: 4.905485e-05
 
  all.equal(mat, t(mat2))
 [1] TRUE
 
 
 Even when increasing the size of 'df' to 1:1000:
 
 
   system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
 [1] 0.16 0.01 0.16 0.00 0.00
 
   system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst =
 TRUE)
 [1] 0.16 0.00 0.18 0.00 0.00
  
   # No round() or t() to test for overhead
   system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
 [1] 0.16 0.01 0.17 0.00 0.00
 
 
 
 It also seems that, at least in this case, t() and round() do 
 not add much overhead.
 
 Best regards,
 
 Marc
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Marc Schwartz (via MN)
Prof. Ripley,

Excellent point. Neither sapply() nor outer() are the elephant in the
room in this situation.



On Fri, 2005-08-26 at 16:55 +0100, Prof Brian Ripley wrote:
 Try profiling.  Doing this many times to get an overview, e.g. for sapply 
 with df=1:1000:
 
 %   self%   total
   self secondstotalsecondsname
   98.26  6.78 98.26  6.78 FUN
0.58  0.04  0.58  0.04 unlist
0.29  0.02  0.87  0.06 as.vector
0.29  0.02  0.58  0.04 names-
0.29  0.02  0.29  0.02 names-.default
0.29  0.02  0.29  0.02 names
 
 so almost all the time is in qchisq.
 
 
 On Fri, 26 Aug 2005, Marc Schwartz (via MN) wrote:
 
  On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote:
  Marc Schwartz [EMAIL PROTECTED] writes:
 
  x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9,
 0.95, 0.975, 0.99, 0.995)
 
  df - c(1:100)
 
  mat - sapply(x, qchisq, df)
 
  dim(mat)
  [1] 100  11
 
  str(mat)
   num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ...
 
  outer() is perhaps a more natural first try... It does give the
  transpose of the sapply approach, though.
 
  round(t(outer(x,df,qchisq)),2)
 
  should be close. You should likely add dimnames.
 
 
 
  What I find interesting, is that I would have intuitively expected
  outer() to be faster than sapply().  However:
 
 
   system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
  [1] 0.01 0.00 0.01 0.00 0.00
 
   system.time(mat1 - round(t(outer(x, df, qchisq)), 2),
gcFirst = TRUE)
  [1] 0.01 0.00 0.01 0.00 0.00
 
  # No round() or t() to test for overhead
   system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
  [1] 0.01 0.00 0.02 0.00 0.00
 
 
  # Bear in mind the round() on mat1 above
  all.equal(mat, mat1)
  [1] Mean relative  difference: 4.905485e-05
 
  all.equal(mat, t(mat2))
  [1] TRUE
 
 
  Even when increasing the size of 'df' to 1:1000:
 
 
   system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
  [1] 0.16 0.01 0.16 0.00 0.00
 
   system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst =
  TRUE)
  [1] 0.16 0.00 0.18 0.00 0.00
 
   # No round() or t() to test for overhead
   system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
  [1] 0.16 0.01 0.17 0.00 0.00
 
 
 
  It also seems that, at least in this case, t() and round() do not add
  much overhead.
 
 Definitely not for such small matrices.


True and both are C functions, which of course helps as well.

Best regards,

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] parts of data frames: subset vs. [-c()]

2005-08-26 Thread Stefan Th. Gries
Dear all

I have a problem with splitting up a data frame called ReVerb:

» str(ReVerb)
`data.frame':   92713 obs. of  16 variables:
 $ CHILD: Factor w/ 7 levels ABE,ADA,EVE,..: 1 1 1 1 1 1 1 1 1 1 ...
 $ AGE  : Factor w/ 484 levels 1;06.00,1;06.16,..: 43 43 43 99 99 99 99 
99 99 99 ...
 $ AGE_Q: num  2.0 2.0 2.0 2.4 2.4 ...
 $ INTERVALS: num  2 2 2 2.25 2.25 2.25 2.25 2.25 2.25 2.25 ...
 $ RND  : int  34368 38311 14949 20586 72516 27186 88019 10767 114448 86146 
...
 $ SYNTAX   : Factor w/ 17 levels Acmp,Amats,..: 15 12 8 15 7 16 7 7 16 7 
...
 $ LEXICAL  : Factor w/ 1643 levels $ACHE,$ACT,..: 194 803 803 294 299 803 
1562 299 679 1562 ...
 $ MORPH: Factor w/ 337 levels $,$ =inf,$ =prs,..: 9 20 9 39 184 231 
57 67 231 39 ...
 $ COMPLEM  : Factor w/ 1989 levels $,$ V PR=Lp [1.2],..: 203 547 220 203 
1101 368 1834 1667 368 1834 ...
 $ MATRIX   : Factor w/ 906 levels $ ???,$ be PR=Aen,..: 5 5 5 308 5 856 5 
5 856 308 ...
 $ SITUATION: Factor w/ 9 levels [imitation of Mom: you know what I said],..: 
2 2 2 2 2 2 2 2 2 2 ...
 $ V_ANN: int  1 1 1 4 4 4 4 3 3 3 ...
 $ QUEST: int  0 0 0 0 0 0 0 0 0 0 ...
 $ EXCL : int  0 0 0 1 1 1 1 0 0 0 ...
 $ U_LEN: int  3 4 5 13 13 13 13 8 8 8 ...
 $ UTTERANCE: Factor w/ 55113 levels ,# (be)cause he wanted to .,..: 5696 
39091 52180 2262 2262 2262 2262 3593 3593 3593 ...

The level causing the problem is SYNTAX:

» as.data.frame(sort(table(SYNTAX)))
  sort(table(SYNTAX))
Particles 100
PR=N1 144
Amats 271
Trans_PR=A2   787
Ditrans  1181
Intrans_PR=A11399
Acmp 2402
Trans_PR=V2  2433
CPcmps   2769
Vpreps   4896
Intrans_V0   5182
Trans_PR=L2  7653
Trans_V028117
Intrans_PR=L18457
Intrans_V1   9643
Intrans_PR=V1   14987
Trans_V12   22288


I would like to extract all cases where SYNTAX==Ditrans from ReVerb, store 
that in a file, and then generate ReVerb again without these cases and factor 
levels. My problem is probably obvious from the following lines of code:

» ditrans-which(SYNTAX==Ditrans)
» ReVerb1-ReVerb[-c(ditrans),]; dim(ReVerb1)
[1] 9153216
» 
» # ok, so the 92713-91532=1181 cases where SYNTAX==Ditrans have been 
removed, but ...
» 
» ReVerb1-subset(ReVerb, SYNTAX!=Ditrans); dim(ReVerb1)
[1] 9152816
» 
» # ... so why don't I get 91532 again as the number of rows?
» 
Any ideas??

» R.version # on Windows XP with service Pack 2
 _  
platform i386-pc-mingw32
arch i386   
os   mingw32
system   i386, mingw32  
status  
major2  
minor1.1
year 2005   
month06 
day  20 
language R  

Thanks a lot,
STG
--
Stefan Th. Gries

Max Planck Inst. for Evol. Anthropology
http://people.freenet.de/Stefan_Th_Gries


Machen Sie aus 14 Cent spielend bis zu 100 Euro!
Die neue Gaming-Area von Arcor - über 50 Onlinespiele im Angebot.
http://www.arcor.de/rd/emf-gaming-1

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with certain data sets when using randomForest

2005-08-26 Thread Prof Brian Ripley
Look at ?[.factor:

finaldataset$Species - finaldataset$Species[,drop=TRUE]

solves this.

On Fri, 26 Aug 2005, Martin Lam wrote:

 Hi,

 Since I've had no replies on my previous post about my
 problem I am posting it again in the hope someone
 notice it. The problem is that the randomForest
 function doesn't take datasets which has instances
 only containing a subset of  all the classes. So the
 dataset with instances that either belong to class a
 or b from the levels a, b and c doesn't work
 because there is no instance that has class c. Is
 there any way to solve this problem?

 library(randomForest)

 # load the iris plant data set
 dataset - iris

 numberarray - array(1:nrow(dataset), nrow(dataset),
 1)

 # include only instances with Species = setosa or
 virginica
 indices - t(numberarray[(dataset$Species == setosa
 |
 dataset$Species == virginica) == TRUE])

 finaldataset - dataset[indices,]

 # just to let you see the 3 classes
 levels(finaldataset$Species)

 # create the random forest
 randomForest(formula = Species ~ ., data =
 finaldataset, ntree = 5)

 # The error message I get
 Error in randomForest.default(m, y, ...) :
Can't have empty classes in y.

 #The problem is that the finaldataset doesn't contain
 #any instances of versicolor, so I think the only
 way #to solve this problem is by changing the levels
 the #Species have to only setosa and virginica,
 # correct me if I'm wrong.

 # So I tried to change the levels but I got stuck:

 # get the possible unique classes
 uniqueItems - unique(levels(finaldataset$Species))

 # the problem!
 newlevels - list(uniqueItems[1] = c(uniqueItems[1],
 uniqueItems[2]), uniqueItems[3] = uniqueItems[3])

 # Error message
 Error: syntax error

 # In the help they use constant names to rename the
 #levels, so this works (but that's not what I want
 #because I don't want to change the code every time I
 #use another data set):
 newlevels - list(setosa = c(uniqueItems[1],
 uniqueItems[2]), virginica = uniqueItems[3])

 levels(finaldataset$Species) - newlevels

 levels(finaldataset$Species)

 finaldataset$Species

 ---

 Thanks in advance,

 Martin

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Matrix oriented computing

2005-08-26 Thread Gabor Grothendieck
On 8/26/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote:
 On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote:
  Marc Schwartz [EMAIL PROTECTED] writes:
 
   x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9,
  0.95, 0.975, 0.99, 0.995)
  
   df - c(1:100)
  
   mat - sapply(x, qchisq, df)
  
dim(mat)
   [1] 100  11
  
str(mat)
num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ...
 
  outer() is perhaps a more natural first try... It does give the
  transpose of the sapply approach, though.
 
  round(t(outer(x,df,qchisq)),2)
 
  should be close. You should likely add dimnames.
 
 
 
 What I find interesting, is that I would have intuitively expected
 outer() to be faster than sapply().  However:
 
 
   system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
 [1] 0.01 0.00 0.01 0.00 0.00
 
   system.time(mat1 - round(t(outer(x, df, qchisq)), 2),
   gcFirst = TRUE)
 [1] 0.01 0.00 0.01 0.00 0.00
 
 # No round() or t() to test for overhead
   system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
 [1] 0.01 0.00 0.02 0.00 0.00
 
 
 # Bear in mind the round() on mat1 above
  all.equal(mat, mat1)
 [1] Mean relative  difference: 4.905485e-05
 
  all.equal(mat, t(mat2))
 [1] TRUE
 
 
 Even when increasing the size of 'df' to 1:1000:
 
 
   system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE)
 [1] 0.16 0.01 0.16 0.00 0.00
 
   system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst =
 TRUE)
 [1] 0.16 0.00 0.18 0.00 0.00
 
   # No round() or t() to test for overhead
   system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE)
 [1] 0.16 0.01 0.17 0.00 0.00
 
 
 
 It also seems that, at least in this case, t() and round() do not add
 much overhead.
 

You might need to do it repeatedly to get a more reliable reading.
When I do it 1000 times outer is faster than sapply though not by much:

 n - 1000
 system.time(for (i in 1:n) mat - sapply(x, qchisq, df), gcFirst = TRUE)
[1] 14.05  0.00 14.43NANA
 
 system.time(for(i in 1:n) mat2 - outer(x, df, qchisq), gcFirst = TRUE)
[1] 13.42  0.00 13.85NANA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] parts of data frames: subset vs. [-c()]

2005-08-26 Thread Peter Dalgaard
Stefan Th. Gries [EMAIL PROTECTED] writes:

 Dear all
 
 I have a problem with splitting up a data frame called ReVerb:
 
 » str(ReVerb)
 `data.frame':   92713 obs. of  16 variables:
  $ CHILD: Factor w/ 7 levels ABE,ADA,EVE,..: 1 1 1 1 1 1 1 1 1 1 ...
  $ AGE  : Factor w/ 484 levels 1;06.00,1;06.16,..: 43 43 43 99 99 99 
 99 99 99 99 ...
  $ AGE_Q: num  2.0 2.0 2.0 2.4 2.4 ...
  $ INTERVALS: num  2 2 2 2.25 2.25 2.25 2.25 2.25 2.25 2.25 ...
  $ RND  : int  34368 38311 14949 20586 72516 27186 88019 10767 114448 
 86146 ...
  $ SYNTAX   : Factor w/ 17 levels Acmp,Amats,..: 15 12 8 15 7 16 7 7 16 7 
 ...
  $ LEXICAL  : Factor w/ 1643 levels $ACHE,$ACT,..: 194 803 803 294 299 
 803 1562 299 679 1562 ...
  $ MORPH: Factor w/ 337 levels $,$ =inf,$ =prs,..: 9 20 9 39 184 
 231 57 67 231 39 ...
  $ COMPLEM  : Factor w/ 1989 levels $,$ V PR=Lp [1.2],..: 203 547 220 203 
 1101 368 1834 1667 368 1834 ...
  $ MATRIX   : Factor w/ 906 levels $ ???,$ be PR=Aen,..: 5 5 5 308 5 856 
 5 5 856 308 ...
  $ SITUATION: Factor w/ 9 levels [imitation of Mom: you know what I 
 said],..: 2 2 2 2 2 2 2 2 2 2 ...
  $ V_ANN: int  1 1 1 4 4 4 4 3 3 3 ...
  $ QUEST: int  0 0 0 0 0 0 0 0 0 0 ...
  $ EXCL : int  0 0 0 1 1 1 1 0 0 0 ...
  $ U_LEN: int  3 4 5 13 13 13 13 8 8 8 ...
  $ UTTERANCE: Factor w/ 55113 levels ,# (be)cause he wanted to .,..: 5696 
 39091 52180 2262 2262 2262 2262 3593 3593 3593 ...
 
 The level causing the problem is SYNTAX:
 
 » as.data.frame(sort(table(SYNTAX)))
   sort(table(SYNTAX))
 Particles 100
 PR=N1 144
 Amats 271
 Trans_PR=A2   787
 Ditrans  1181
 Intrans_PR=A11399
 Acmp 2402
 Trans_PR=V2  2433
 CPcmps   2769
 Vpreps   4896
 Intrans_V0   5182
 Trans_PR=L2  7653
 Trans_V028117
 Intrans_PR=L18457
 Intrans_V1   9643
 Intrans_PR=V1   14987
 Trans_V12   22288
 
 
 I would like to extract all cases where SYNTAX==Ditrans from ReVerb, store 
 that in a file, and then generate ReVerb again without these cases and factor 
 levels. My problem is probably obvious from the following lines of code:
 
 » ditrans-which(SYNTAX==Ditrans)
 » ReVerb1-ReVerb[-c(ditrans),]; dim(ReVerb1)
 [1] 9153216
 » 
 » # ok, so the 92713-91532=1181 cases where SYNTAX==Ditrans have been 
 removed, but ...
 » 
 » ReVerb1-subset(ReVerb, SYNTAX!=Ditrans); dim(ReVerb1)
 [1] 9152816
 » 
 » # ... so why don't I get 91532 again as the number of rows?
 » 
 Any ideas??

The SYNTAX variable is not necessarily the same. Could you retry the
first case with 

 ditrans - which(ReVerb$SYNTAX==Ditrans)

? 

Otherwise, try doing a setdiff() on the rownames of the two discrepant
results and see which are the four cases that differ.

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] class 'named'

2005-08-26 Thread Phineas Campbell
I'm working through the examples in Venables and Ripley in the 'New-style
Classes' chapter.

On a call to representation, in the lda example, it is unable to find the
class named.

Is the class named defined anywhere?  I've loaded the library methods but
this hasn't helped.

Phineas Campbell

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Help in Compliling user -defined functions in Rpart

2005-08-26 Thread Luwis Tapiwa Diya
I have been trying to write my own user defined function in Rpart.I
imitated the anova splitting rule which is given as an example.In the
work I am doing ,I am calculating the concentration index(ci) ,which
is in between -1 and +1.So my deviance is given by
abs(ci)*(1-abs(ci)).Now when I run rpart incorporating this user
defined function i get the following error message:

 Error in user.split(yback[1:nback], wback[1:nback], xback[1:nback], parms,  : 
unused argument(s) ( ...)

Now I am failing to indentify where I am going wrong (In case I am
have made some mistake).So I was wondering if there is anybody who
have written some user defined functions of theirs and maybe if there
is any documentation with regards to user defined functions and
examples.

Regards ,

Luwis Diya

#User
defined function
#

temp.init-function(y,offset,parms,wt){
if (!is.null(offset)) y-y-offset 
if (is.matrix(y))stop (response must be a vector)

list(y=y,parms=0,numy=1,numresp=1,
   summary=function(yval,dev,wt,ylevel,digits){
 paste(mean=,format(signif(yval,digits)),
   MSE=,format(signif(dev/wt,digits)),
sep='')
})
} 


temp.eval-function(y,wt,parms){
n-length(y)
r-wt
for (i in 1:n-1) {r[i+1]=(sum(wt[1:i])+0.5*wt[i+1])/n} #fractional rank 
r[1]-0.5*wt[1]/n
wmean-sum(y*wt)/sum(wt)
ci-2*sum(wt*(y-wmean)*(r-0.5))/sum(wt*y) #concentration index 
for
socio-economic inequality
dev-abs(ci)*(1-abs(ci))  #deviance following 
the gini impurity approach
list(label=wmean,deviance=dev)
}

 
temp.split-function(y,wt,parms,continous){
n-length(y)
r-wt
for (i in 1:n-1) {r[i+1]=(sum(wt[1:i])+0.5*wt[i+1])/n}
r[1]-0.5*wt[1]/n
wmean-sum(y*wt)/sum(wt)
ci-2*sum(wt*(y-wmean)*(r-0.5))/sum(wt*y)
devci-abs(ci)*(1-abs(ci))

if(continous){
  lss-cumsum(wt*y)[-n]
  rss-sum(wt*y)-lss 
  lw-cumsum(wt)[-n]
  rw-sum(wt)-lw 
  lm-lss/lw
  rm-rss/rw
  
lcss-cumsum(wt[1:length(lm)]*(y[1:length(lm)]-lm)*(r[1:length(lm)]-0.5))
  rcss-sum(wt*(y-wmean)*(r-0.5))-lcss
  lci-2*lcss/lss #concentration index 
for left side
  rci-2*rcss/rss #concentration index 
for right side
  devlci-abs(lci)*(1-abs(lci))   #deviance for left 
side
  devrci-abs(rci)*(1-abs(rci))   #deviance for right 
side  

  goodness-devci-(lw/sum(wt))*devlci-(rw/sum(wt))*devrci
  list(goodness=goodness, direction=sign(lci))
  }
   else {
 ux-sort(unique(x))
 wtsum-tapply(wt,x,sum)
 ysum-tapply(wt*y,x,sum)
 means-ysum/wtsum

 ord-order(means)
   n-length(ord)   
 lss-cumsum(ysum[ord])[-n]
 rss-sum(ysum)-lss 
 lw-cumsum(wtsum[ord])[-n]
 rw-sum(wtsum)-lw 
 lm-lss/lw
 rm-rss/rw
 lysum-tapply(wt*(y-lm)*(r-0.5),x,sum)
 lcss-cumsum(lysum[ord])[-n]
 rcss-sum(lysum)-lcss
 lci-2*lcss/lss
 rci-2*rcss/rss
 devlci-abs(lci)*(1-abs(lci))
 devrci-abs(rci)*(1-abs(rci))

  goodness-devci-0.5*(lw/sum(wt))*devlci-0.5*(rw/sum(wt))*devrci
  list(goodness=goodness, direction=sign(lci))
 }
}

alist-list(eval=temp.eval,split=temp.split,init=temp.init)
tree-rpart(u~pcares+antcare.skilled+riskintb+child.born+married+mage1+mage2,
weights=popweight,method=alist)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Unpaste Problem

2005-08-26 Thread A Mani
Hello,
Easy ways to unpaste?
 xp - paste(x2, x3) # x2, x3 are two non-numeric columns.
.
.
xfg - data.frame(xp,sc1, sc2, sc3) # sc1,sc2, sc3 are numeric cols.

I want xp to be split up to form a new dataframe of the form (x3, sc1,
sc2, sc3).
IMPORTANT info : elements of xp have the form abcspaceefg, with abc
in x2 and efg in x3.

Thanks in advance, 
-- 
A. Mani
Member, Cal. Math. Soc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] compare c-index of two logistic models using rcorrp.senc() of the Hmisc library

2005-08-26 Thread Osman Al-Radi
Dear R-help,

Would it be appropriate to do the following to
calculate a p-value for the difference between c-ind
of x1 and c-inx of x2 using the output from
rcorrp.senc()

 r-rcorrp.senc(x1,x1,y)

 pValue-1-pnorm((r[11]-r[12])/(r[2]/r[5])*1.96) 

Osman O. Al-Radi, MD, MSc, FRCSC
Chief Resident, Cardiac Surgery
University of Toronto, Canada

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] compare c-index of two logistic models using rcorrp.senc() of the Hmisc library

2005-08-26 Thread Frank E Harrell Jr
Osman Al-Radi wrote:
 Dear R-help,
 
 Would it be appropriate to do the following to
 calculate a p-value for the difference between c-ind
 of x1 and c-inx of x2 using the output from
 rcorrp.senc()
 
 
r-rcorrp.senc(x1,x1,y)
 
 
pValue-1-pnorm((r[11]-r[12])/(r[2]/r[5])*1.96) 
 
 
 Osman O. Al-Radi, MD, MSc, FRCSC
 Chief Resident, Cardiac Surgery
 University of Toronto, Canada

Osman,

Because tests for differences in two ROC areas are not very powerful, 
rcorrp.cens changes the hypothesis to are predictions from one method 
more concordant with the outcome than predictions from the other method, 
within paired predictions.  You can't get a difference in ROC areas 
from the U-statistic computed by rcorrp.cens.

Frank



-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] parts of data frames: subset vs. [-c()]

2005-08-26 Thread Prof Brian Ripley

Are there NAs in the variable?

SYNTAX==Ditrans and SYNTAX!=Ditrans are not mutually exclusive.


On Fri, 26 Aug 2005, Stefan Th. Gries wrote:


Dear all

I have a problem with splitting up a data frame called ReVerb:

» str(ReVerb)
`data.frame':   92713 obs. of  16 variables:
$ CHILD: Factor w/ 7 levels ABE,ADA,EVE,..: 1 1 1 1 1 1 1 1 1 1 ...
$ AGE  : Factor w/ 484 levels 1;06.00,1;06.16,..: 43 43 43 99 99 99 99 
99 99 99 ...
$ AGE_Q: num  2.0 2.0 2.0 2.4 2.4 ...
$ INTERVALS: num  2 2 2 2.25 2.25 2.25 2.25 2.25 2.25 2.25 ...
$ RND  : int  34368 38311 14949 20586 72516 27186 88019 10767 114448 86146 
...
$ SYNTAX   : Factor w/ 17 levels Acmp,Amats,..: 15 12 8 15 7 16 7 7 16 7 ...
$ LEXICAL  : Factor w/ 1643 levels $ACHE,$ACT,..: 194 803 803 294 299 803 
1562 299 679 1562 ...
$ MORPH: Factor w/ 337 levels $,$ =inf,$ =prs,..: 9 20 9 39 184 231 
57 67 231 39 ...
$ COMPLEM  : Factor w/ 1989 levels $,$ V PR=Lp [1.2],..: 203 547 220 203 
1101 368 1834 1667 368 1834 ...
$ MATRIX   : Factor w/ 906 levels $ ???,$ be PR=Aen,..: 5 5 5 308 5 856 5 5 
856 308 ...
$ SITUATION: Factor w/ 9 levels [imitation of Mom: you know what I said],..: 
2 2 2 2 2 2 2 2 2 2 ...
$ V_ANN: int  1 1 1 4 4 4 4 3 3 3 ...
$ QUEST: int  0 0 0 0 0 0 0 0 0 0 ...
$ EXCL : int  0 0 0 1 1 1 1 0 0 0 ...
$ U_LEN: int  3 4 5 13 13 13 13 8 8 8 ...
$ UTTERANCE: Factor w/ 55113 levels ,# (be)cause he wanted to .,..: 5696 
39091 52180 2262 2262 2262 2262 3593 3593 3593 ...

The level causing the problem is SYNTAX:

» as.data.frame(sort(table(SYNTAX)))
 sort(table(SYNTAX))
Particles 100
PR=N1 144
Amats 271
Trans_PR=A2   787
Ditrans  1181
Intrans_PR=A11399
Acmp 2402
Trans_PR=V2  2433
CPcmps   2769
Vpreps   4896
Intrans_V0   5182
Trans_PR=L2  7653
Trans_V028117
Intrans_PR=L18457
Intrans_V1   9643
Intrans_PR=V1   14987
Trans_V12   22288


I would like to extract all cases where SYNTAX==Ditrans from ReVerb, store 
that in a file, and then generate ReVerb again without these cases and factor levels. My 
problem is probably obvious from the following lines of code:

» ditrans-which(SYNTAX==Ditrans)
» ReVerb1-ReVerb[-c(ditrans),]; dim(ReVerb1)
[1] 9153216
»
» # ok, so the 92713-91532=1181 cases where SYNTAX==Ditrans have been 
removed, but ...
»
» ReVerb1-subset(ReVerb, SYNTAX!=Ditrans); dim(ReVerb1)
[1] 9152816
»
» # ... so why don't I get 91532 again as the number of rows?
»
Any ideas??


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] class 'named'

2005-08-26 Thread Prof Brian Ripley
That is one of the S4 vs R differences.  See the complements.

On Fri, 26 Aug 2005, Phineas Campbell wrote:

 I'm working through the examples in Venables and Ripley in the 'New-style
 Classes' chapter.

 On a call to representation, in the lda example, it is unable to find the
 class named.

 Is the class named defined anywhere?  I've loaded the library methods but
 this hasn't helped.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help in Compliling user -defined functions in Rpart

2005-08-26 Thread Prof Brian Ripley
On Fri, 26 Aug 2005, Luwis Tapiwa Diya wrote:

 I have been trying to write my own user defined function in Rpart.I
 imitated the anova splitting rule which is given as an example.In the
 work I am doing ,I am calculating the concentration index(ci) ,which
 is in between -1 and +1.So my deviance is given by
 abs(ci)*(1-abs(ci)).Now when I run rpart incorporating this user
 defined function i get the following error message:

 Error in user.split(yback[1:nback], wback[1:nback], xback[1:nback], parms,  :
unused argument(s) ( ...)

 Now I am failing to indentify where I am going wrong (In case I am
 have made some mistake).So I was wondering if there is anybody who
 have written some user defined functions of theirs and maybe if there
 is any documentation with regards to user defined functions and
 examples.

There is a commented example in the tests directory (of the sources).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] problem with certain data sets when using randomForest

2005-08-26 Thread Martin Lam
Thank you for this and earlier help Mr. Ripley.

Martin

--- Prof Brian Ripley [EMAIL PROTECTED] wrote:

 Look at ?[.factor:
 
   finaldataset$Species -
 finaldataset$Species[,drop=TRUE]
 
 solves this.
 
 On Fri, 26 Aug 2005, Martin Lam wrote:
 
  Hi,
 
  Since I've had no replies on my previous post
 about my
  problem I am posting it again in the hope someone
  notice it. The problem is that the randomForest
  function doesn't take datasets which has instances
  only containing a subset of  all the classes. So
 the
  dataset with instances that either belong to class
 a
  or b from the levels a, b and c doesn't
 work
  because there is no instance that has class c.
 Is
  there any way to solve this problem?
 
  library(randomForest)
 
  # load the iris plant data set
  dataset - iris
 
  numberarray - array(1:nrow(dataset),
 nrow(dataset),
  1)
 
  # include only instances with Species = setosa or
  virginica
  indices - t(numberarray[(dataset$Species ==
 setosa
  |
  dataset$Species == virginica) == TRUE])
 
  finaldataset - dataset[indices,]
 
  # just to let you see the 3 classes
  levels(finaldataset$Species)
 
  # create the random forest
  randomForest(formula = Species ~ ., data =
  finaldataset, ntree = 5)
 
  # The error message I get
  Error in randomForest.default(m, y, ...) :
 Can't have empty classes in y.
 
  #The problem is that the finaldataset doesn't
 contain
  #any instances of versicolor, so I think the
 only
  way #to solve this problem is by changing the
 levels
  the #Species have to only setosa and
 virginica,
  # correct me if I'm wrong.
 
  # So I tried to change the levels but I got stuck:
 
  # get the possible unique classes
  uniqueItems -
 unique(levels(finaldataset$Species))
 
  # the problem!
  newlevels - list(uniqueItems[1] =
 c(uniqueItems[1],
  uniqueItems[2]), uniqueItems[3] = uniqueItems[3])
 
  # Error message
  Error: syntax error
 
  # In the help they use constant names to rename
 the
  #levels, so this works (but that's not what I want
  #because I don't want to change the code every
 time I
  #use another data set):
  newlevels - list(setosa = c(uniqueItems[1],
  uniqueItems[2]), virginica = uniqueItems[3])
 
  levels(finaldataset$Species) - newlevels
 
  levels(finaldataset$Species)
 
  finaldataset$Species
 
  ---
 
  Thanks in advance,
 
  Martin
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 
 
 -- 
 Brian D. Ripley, 
 [EMAIL PROTECTED]
 Professor of Applied Statistics, 
 http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865
 272861 (self)
 1 South Parks Road, +44 1865
 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865
 272595


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Pseudo-Voigt fit

2005-08-26 Thread ppancoska

Hi, I am sorry for this question, but I am trying to speed up an
application
I will need to fit many x-y data sets (input from text files) to
4-parameter Pseudo-Voigt peak function.
Until now I used SigmaPlot macro to do it (enclosed just in case...)

peaksign(q) = if(total(q)q[1], 1, -1)
xatymin(q,r) = xatymax(q,max(r)-r)
[Parameters]
a = if(peaksign(y)0, max(y), min(y)) ''Auto {{previous: 60.8286}}
b = fwhm(x,abs(y))/2 ''Auto {{previous: 0.656637}}
c = .5 ''Auto {{previous: 6.82973e-010}}
x0 = if(peaksign(y)0, xatymax(x,y), xatymin(x,y)) ''Auto {{previous:
3.19308}}


[Equation]
f = a*(c*(1/(1+((x-x0)/b)^2))+(1-c)*exp(-0.5*((x-x0)/b)^2))

fit f to y

 (manageable for ~100), but it looks like the next project would need to
process ~1000 member sets.

I am not as familiar with R to find the right info (although I can use R in
general).

I am also nearly sure that there should be a solution to this task out
there ready to be modified...

Could you be so kind and direct me please to the right package or web-site
with examples?

Thank you very much



Dr. Petr Pancoska
Department of Pathology
SUNY Stony Brook, NY 11794
phone:  (631)-444-3030

**

This e- mail message, including any attachments,
is for the sole use of the intended recipient(s) and may
contain confidential and privileged information.
Any unauthorized review, use, disclosure or distribution is prohibited.
If you are not the intended recipient, please contact the sender
by e-mail and destroy all copies of the original.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unpaste Problem

2005-08-26 Thread Marc Schwartz (via MN)
On Fri, 2005-08-26 at 22:39 +0530, A Mani wrote:
 Hello,
 Easy ways to unpaste?
  xp - paste(x2, x3) # x2, x3 are two non-numeric columns.
 .
 .
 xfg - data.frame(xp,sc1, sc2, sc3) # sc1,sc2, sc3 are numeric cols.
 
 I want xp to be split up to form a new dataframe of the form (x3, sc1,
 sc2, sc3).
 IMPORTANT info : elements of xp have the form abcspaceefg, with abc
 in x2 and efg in x3.
 
 Thanks in advance, 


I think I understand what you are trying to do. Hopefully the below may
be helpful:

# Create the data frame with 3 rows
x2 - letters[1:3]
x3 - LETTERS[1:3]
xp - paste(x2, x3)

sc1 - rnorm(3)
sc2 - rnorm(3)
sc3 - rnorm(3)

xfg - data.frame(xp, sc1, sc2, sc3)

 xfg
   xpsc1sc2sc3
1 a A  1.3479123 -1.0642578  0.2479218
2 b B -0.1586587  1.1237456 -1.3952176
3 c C  2.7807484 -0.9778066 -1.9322279


# Use strsplit() here to break apart 'xp' using   as the split
# Use [ in sapply() to get the second (2) element from each
# 'xp' list pair. Note that I use as.character() here, since xfg$xp is 
# a factor.
# See the output of: strsplit(as.character(xfg$xp),  )
# for some insight into this approach

xp.split - sapply(strsplit(as.character(xfg$xp),  ), [, 2)


# show post split values
 xp.split
[1] A B C


# Now cbind it all together into a new data frame
# don't include column 1 from xfg (xp)
xfg.new - cbind(xp.split, xfg[, -1])


 xfg.new
  xp.splitsc1sc2sc3
1A  1.3479123 -1.0642578  0.2479218
2B -0.1586587  1.1237456 -1.3952176
3C  2.7807484 -0.9778066 -1.9322279


See ?strsplit for more information.
 
HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Creating factors from continuous variables

2005-08-26 Thread David James
What is the quickest way to create many categorical variables  
(factors) from continuous variables?

This is the approach that I have used:

# create sample data
N - 20
x - runif(N,0,1)

# setup ranges to define categories
x.a - (x = 0.0)  (x  0.4)
x.b - (x = 0.4)  (x  0.5)
x.c - (x = 0.5)  (x  0.6)
x.d - (x = 0.6)  (x  1.0)

# create factors
i - runif(N,1,1)
x.new - (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d)
x.factor - factor(x.new)

I'm looking for a better / simpler / more elegant / more robust (as  
the number of categories increases) way to do this.  I also don't  
like that my factor names can only be numbers in this example.  I  
would prefer a solution to take a form like the following (inspired  
by the hist function):

# define breakpoints
x.breaks = c(0, 0.4, 0.5, 0.6, 1.0)
x.factornames = c( 0 - 0.4, 0.4 - 0.5, 0.5 - 0.6, 0.6 - 1.0 )
x.factor = unknown.function( x, x.breaks, x.factornames )

Thanks,
David

P.S. Here's what I have read to try to find the answer to my problem:
* Introductory Statistics with R
* A Brief Guide to R for Beginners in Econometrics
* Econometrics in R

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Creating factors from continuous variables

2005-08-26 Thread Berton Gunter
?cut

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
The business of the statistician is to catalyze the scientific learning
process.  - George E. P. Box
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of David James
 Sent: Friday, August 26, 2005 2:00 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Creating factors from continuous variables
 
 What is the quickest way to create many categorical variables  
 (factors) from continuous variables?
 
 This is the approach that I have used:
 
 # create sample data
 N - 20
 x - runif(N,0,1)
 
 # setup ranges to define categories
 x.a - (x = 0.0)  (x  0.4)
 x.b - (x = 0.4)  (x  0.5)
 x.c - (x = 0.5)  (x  0.6)
 x.d - (x = 0.6)  (x  1.0)
 
 # create factors
 i - runif(N,1,1)
 x.new - (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d)
 x.factor - factor(x.new)
 
 I'm looking for a better / simpler / more elegant / more robust (as  
 the number of categories increases) way to do this.  I also don't  
 like that my factor names can only be numbers in this example.  I  
 would prefer a solution to take a form like the following (inspired  
 by the hist function):
 
 # define breakpoints
 x.breaks = c(0, 0.4, 0.5, 0.6, 1.0)
 x.factornames = c( 0 - 0.4, 0.4 - 0.5, 0.5 - 0.6, 0.6 - 1.0 )
 x.factor = unknown.function( x, x.breaks, x.factornames )
 
 Thanks,
 David
 
 P.S. Here's what I have read to try to find the answer to my problem:
 * Introductory Statistics with R
 * A Brief Guide to R for Beginners in Econometrics
 * Econometrics in R
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Creating factors from continuous variables

2005-08-26 Thread Prof Brian Ripley
?cut

This is in `An Introduction to R', the manual which ships with R and basic 
reading.

On Fri, 26 Aug 2005, David James wrote:

 What is the quickest way to create many categorical variables
 (factors) from continuous variables?

 This is the approach that I have used:

 # create sample data
 N - 20
 x - runif(N,0,1)

 # setup ranges to define categories
 x.a - (x = 0.0)  (x  0.4)
 x.b - (x = 0.4)  (x  0.5)
 x.c - (x = 0.5)  (x  0.6)
 x.d - (x = 0.6)  (x  1.0)

 # create factors
 i - runif(N,1,1)
 x.new - (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d)
 x.factor - factor(x.new)

 I'm looking for a better / simpler / more elegant / more robust (as
 the number of categories increases) way to do this.  I also don't
 like that my factor names can only be numbers in this example.  I
 would prefer a solution to take a form like the following (inspired
 by the hist function):

 # define breakpoints
 x.breaks = c(0, 0.4, 0.5, 0.6, 1.0)
 x.factornames = c( 0 - 0.4, 0.4 - 0.5, 0.5 - 0.6, 0.6 - 1.0 )
 x.factor = unknown.function( x, x.breaks, x.factornames )

 Thanks,
 David

 P.S. Here's what I have read to try to find the answer to my problem:
 * Introductory Statistics with R
 * A Brief Guide to R for Beginners in Econometrics
 * Econometrics in R

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unpaste Problem

2005-08-26 Thread Gabor Grothendieck
On 8/26/05, A Mani [EMAIL PROTECTED] wrote:
 Hello,
Easy ways to unpaste?
  xp - paste(x2, x3) # x2, x3 are two non-numeric columns.
 .
 .
 xfg - data.frame(xp,sc1, sc2, sc3) # sc1,sc2, sc3 are numeric cols.
 
 I want xp to be split up to form a new dataframe of the form (x3, sc1,
 sc2, sc3).
 IMPORTANT info : elements of xp have the form abcspaceefg, with abc
 in x2 and efg in x3.
 

See

https://www.stat.math.ethz.ch/pipermail/r-help/2005-August/076492.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] zoo, zooreg, ISOdatetime

2005-08-26 Thread David James
I create a zooreg object that runs from Jan-1-2002 0:00 to Jun-1-2005  
0:00...

regts.start = ISOdatetime(2002, 1, 1, hour=0, min=0, sec=0, tz=)
regts.end = ISOdatetime(2005, 6, 1, hour=0, min=0, sec=0, tz=)
regts.zoo - zooreg( NA, regts.start, regts.end, deltat=3600 )

Upon inspection:
  regts.zoo[1:3]
2002-01-01 00:00:00 2002-01-01 01:00:00 2002-01-01 02:00:00
  NA  NA  NA

  regts.zoo[29926:29928]
2005-05-31 22:00:00 2005-05-31 23:00:00 2005-06-01 00:00:00
  NA  NA  NA

However:
  summary(regts.zoo)
Error in row.names-.data.frame(`*tmp*`, value = c(2002-01-01  
00:00:00,  :
 duplicate 'row.names' are not allowed

I don't understand why it claims that there are duplicate row.names.   
Any advice?

I probably could use the aggregate function to clean this up, but I  
don't see why it should be needed (provided that I do things properly  
in the first place).

Thanks,
David

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] PLSR: model notation and reliabilities

2005-08-26 Thread I.Ioannou

I'm new in both R and statistics. I did my homework, 
I tried the archives and whatever I managed to get 
from the sources, but still I need assistance with 
the plsr package.


I have a model with 2 core determinants D1 and D2,
made by 3 indicators each (D1a,D1b,D1c and so on).
Also I have 2 moderating variables (m1,m2), where 
m1 moderates D1 and m2 modarates D2. 
The dependent variable (Y) is also constructed by 3 
indicators (Y1,Y2,Y3). Actually my model is far more 
complicated, I just give a simplified example here.

Which is the correct notation for the model 
(I'm skipping the crossvalidation for the moment) : 

  MyModel - plsr(Y1+Y2+Y3 ~ ((D1a+D1b+D1c)*m1) + ((D2a+D2b+D2c)*m2),ncomp=2)

or :

  Y  - cbind(Y1,Y2,Y3)
  X1 - cbind(D1a,D1b,D1c)
  X2 - cbind(D2a,D2b,D2c)
  MyModel - plsr( Y ~ (X1*m1) + (X2*m2),ncomp=2) 


How do I calculate the internal composite reliabilty (ICR) ?
Is the Average variable explained (AVE) the mentioned as 
% variance explained in summary ?

I tried something like (the model is the first notation 
mentioned above, and the calcualtions below are simplified 
just for clarity) :

ncomp=MyModel$ncomp
P   - MyModel$loadings[,ncomp]
Q   - MyModel$Yloadings[,ncomp]
# D1
f1  - P[D1a]
f2  - P[D1b] 
f3  - P[D1c] 
Sp  - f1 + f2 + f3
Sp2 - (f1 ^ 2) + (f2^ 2) + (f3^2)
Sth - (1-(f1 ^ 2)) + (1-(f2 ^ 2)) + (1-(f3^2))
D1_ICR   - (Sp^2) / ( (Sp^2) + Sth)
D1_AVE - Sp2 / ( Sp2 + Sth) 

but the results does not seem to give me something meaningfull.  
For example, while  cronbach(cbind(D1a,D1b,D1c)) gives me  0.90, 
the above computed D1_ICR gives me very low numbers ( .20). 
Also summary says % variance explained for X = 83.1 in 1st component
while my computed D1_AVE is unacceptable ( 10%). 
Where I made it wrong ? Or it is just my data ?

Any help will be much appriciated

TIA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] ARIMA (seasonal) backcasting interpolation

2005-08-26 Thread David James
Thanks for everyone's help with zoo -- I think I've got my data set  
ready.  (The data consists of surface weather temperatures, from 2002  
to 2005, one observation per hour.  Some values are missing... i.e. NA)

I have three goals:

GOAL #1:Get the data in proper time series form, preserving frequency  
information:
 w4.ts - as.ts( w3.zoo, frequency=(1/3600) )
I hope that 1/3600 (0.0002778) is correct.  I chose it because my  
zooreg object reported that value.  This goes back to my choice of  
the ISOdatetime format, which required deltat=3600.

GOAL #2: Do an ARIMA analysis that takes into account seasonal variation
 a.1 - arima(w4.ts,order=c(1,0,0),seasonal=list(order=c 
 (0,1,0),period=12))
First, I'm not quite sure if I should set period=12 (months in a  
year) or period=365*24 (number of my observations in a year).  The  
documentation was unclear to me.

Second, I've noticed that the fracdiff command is useful to find  
appropriate (p,d,q) values for ARIMA models.  But I have not found a  
command that suggests reasonable values for the seasonal (p,d,q) values.

GOAL #3 Use the ARIMA analysis to fill in for NA values.  (I'm not  
sure how to do this yet.  For example, I do not know if I will need  
to use windowing to smooth my backcasted data.

I would appreciate any pointers, references, or code examples.

Also, the terminology of backcasting and interpolation is not  
perfectly clear to me.  I'm certainly looking to do more than linear  
interpolation between data points ... that's why I'm hoping that  
ARIMA will help.  I need seasonal ARIMA, I believe, because there are  
seasonal swings in temperature

Thanks,
David
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How to add values on the axes of the 3D bi-variable lrm fit?

2005-08-26 Thread Spencer Graves
  I have not seen a reply to this, so I will offer a few comments.

  1.  I haven't used lrm, but I assume you are referring to the copy 
in the Design package;  I surmised that from reviewing 
RSiteSearch(lrm).

  2.  I installed Design and learned that I also needed Hmisc, so I 
installed that also.  The example you provided was not complete in 
itself, so that presented another obstacle to helping you.  I therefore 
worked through examples with the documentation for lrm, from which I 
learned that your fit was of class Design.

  3.  The led me to the documentation for plot.Design.  The examples 
there included one that produced a perspective plot.

  4.  I then read the documentation for persp and learned that it had 
an argument 'ticktype' with a nondefault value of detailed, which 
should produce what you want.

  5.  When I added 'ticktype=detailed' to the call to plot, I got 
an error message.  I then started reading the code for plot.Design and 
learned that it had an argument 'perspArgs'.  The documentation for 
plot.Design told me that was a list containing other named arguments 
to be passed to 'persp'.  I tried that and it worked.

  6.  In the future, I believe you can increase your chances of getting 
a useful reply quickly if you PLEASE do read and follow the posting 
guide! http://www.R-project.org/posting-guide.html;.

  Best Wishes,
  spencer graves

Jan Verbesselt wrote:

  
 
 Dear r-list,
 
  
 
 When I try to plot the following 3D lrm fit I obtain only arrows with labels
 on the three axes of the figure (without values).
 
  
 
 fit - lrm(y ~ rcs(x1,knots)+rcs(x2,knots), tol=1e-14,X=T,Y=T)
 
 dd - datadist(x1,x2);options(datadist='dd');
 
 par(mfrow=c(1,1))
 
 plot(fit,x1=NA, x2=NA, theta=50,phi=25)
 
  
 
 How can I add values to the axes of this plot?  (axes with the range of
 values of each of the explanatory variables x1x2)
 
  
 
 Thanks,
 
 Jan
 
  
 
  
 
 ___
 Ir. Jan Verbesselt
 Research Associate
 Group of Geomatics Engineering
 Department Biosystems ~ M³-BIORES
 Vital Decosterstraat 102, 3000 Leuven, Belgium
 Tel: +32-16-329750   Fax: +32-16-329760
 http://gloveg.kuleuven.ac.be/
 ___
 
  
 
 
   [[alternative HTML version deleted]]
 
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com http://www.pdf.com
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] ARIMA (seasonal) backcasting interpolation

2005-08-26 Thread Gabor Grothendieck
On 8/26/05, David James [EMAIL PROTECTED] wrote:
 Thanks for everyone's help with zoo -- I think I've got my data set
 ready.  (The data consists of surface weather temperatures, from 2002
 to 2005, one observation per hour.  Some values are missing... i.e. NA)

 I have three goals:

 GOAL #1:Get the data in proper time series form, preserving frequency
 information:
  w4.ts - as.ts( w3.zoo, frequency=(1/3600) )
 I hope that 1/3600 (0.0002778) is correct.  I chose it because my
 zooreg object reported that value.  This goes back to my choice of
 the ISOdatetime format, which required deltat=3600.

I will just address the zoo portion of this question.

In the ts class, a period is a unit so lets assume we want the
resulting series to have a day represented as a unit.  Then
we first create a zoo series such that the integer part of the index
is a day and then converting that is easy:

regts.day - zoo(coredata(regts.zoo), as.numeric(time(regts.zoo))/(24*3600))
regts.ts - as.ts(regts.day)

or convert it to chron first in which case its easy too:

library(chron)
regts.chron - zoo(coredata(regts.zoo), as.chron(time(regts.zoo)))
regts.ts - as.ts(regts.chron)

Of course had chron been used in the first place just the last line
would be needed.

Note that there are some issues as to which time zone is being used
during conversion which I won't address but you can avoid that by just
using chron right from the beginning.  If you do use chron right from the
beginning you won't have the problem with daylight savings time, you
won't have the problem that the POSIXct and ts representations are very
different and you won't have the problem of worrying about time zones.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] better than sapply

2005-08-26 Thread Omar Lakkis
I have the following two mapping data frames (r) and (h). I want to
fill teh value of r$seid with the value of r$seid where r$cid==h$cid.
I can do it with sapply as such:

 r$seid = sapply(r$cid, function(cid) h[h$cid==cid,]$seid)

Is ther a better (faster) way to do this?

 r - data.frame(seid=NA, cid= c(2181,2221,))
 r
  seid  cid 
1  NA2181   
2  NA2221   
3  NA   

 h - data.frame(seid= c(5598,5609,4931,5611,8123,8122), cid= 
 c(2219,,2181,2190,2817,2221))
 h
 cid  seid 
1 5598 2219
2 5609  
3 4931 2181  
4 5611 2190   
5 8123 2817   
6 8122 2221   
  
to get the desired result of:
 r
  seid  cid
1 4931 2181
2 8122 2221
3 5609 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] better than sapply

2005-08-26 Thread Gabor Grothendieck
I don't know if its faster but you could try timing this to find out:

r$seid - merge(h, r, by = cid)[,2]


On 8/27/05, Omar Lakkis [EMAIL PROTECTED] wrote:
 I have the following two mapping data frames (r) and (h). I want to
 fill teh value of r$seid with the value of r$seid where r$cid==h$cid.
 I can do it with sapply as such:
 
  r$seid = sapply(r$cid, function(cid) h[h$cid==cid,]$seid)
 
 Is ther a better (faster) way to do this?
 
  r - data.frame(seid=NA, cid= c(2181,2221,))
  r
  seid  cid
 1  NA2181
 2  NA2221
 3  NA
 
  h - data.frame(seid= c(5598,5609,4931,5611,8123,8122), cid= 
  c(2219,,2181,2190,2817,2221))
  h
 cid  seid
 1 5598 2219
 2 5609 
 3 4931 2181
 4 5611 2190
 5 8123 2817
 6 8122 2221
 
 to get the desired result of:
  r
  seid  cid
 1 4931 2181
 2 8122 2221
 3 5609 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html