Re: [R] Quick partitioning
On Thu, 25 Aug 2005, Anna Oganyan wrote: Hello, I am quite new in R, and I have one problem: I have large d-dimensional data sets (d=2, 3, 6, 10). I would like to divide the d-dim space into n (n may be 10, but better some larger number, for example 20) equally sized d-dim hypercubes and count how many data points are in each cube. Is there any way to do it quickly, I mean - in a reasonable time? Actually, I want to get some rough idea of underlying densities of these data and compare them. Thanks a lot! Anna How do you divide a 10D space into 10 hypercubes? You need at least some of dimensions to be undivided. The general idea is easy: apply cut() to each dimension, so your dimensions become factors, then table() will produce the counts. That will be quick enough for millions of points. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Fitting data to gaussian distributions
See also the book MASS, the one fitdistr supports. On Thu, 25 Aug 2005, Luis Gracia wrote: Hi again, self-answered. I took a breath and started another google search, this time more successful. I found the following packages in case somebody has the same question: nor1mix wle mixdist(I found this one to be the most useful in my case) Best, Luis Luis Gracia said the following on 08/25/05 20:31: Hi! I need to fit a data that shows up as two gaussians partially superimposed to the corresponding gaussian distributions, i.e. data=c(rnorm(100,5,2),rnorm(100,-6,1)) I figured it out how to do it with mle or fitdistr when only one gaussian is necessary, but not with two or more. Is there a function in R to do this? Thank you very much in advance, Luis __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] question about coxph (was covariance matrix under null)
You will need to tell us of _what_ you want the covariance matrix and what you mean by the `null hypothesis': coxph does estimation, not testing. If you want the covariance matrix of the parameter estimates, see vcov(), which has a coxph method. On Thu, 25 Aug 2005, Devarajan, Karthik wrote: Hello I am fitting a Cox PH model using the function coxph(). Does anyone know how to obtain the estimate of the covariance matrix under the null hypothesis. The function coxph.detail() does not seem to be useful for this purpose. Thanks, KD. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html PLEASE do (no HTML mail, useful subject, please) -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help about R
Don == Don MacQueen [EMAIL PROTECTED] on Thu, 25 Aug 2005 08:11:42 -0700 writes: Don Also, for the three dimensional graphic, Don help.search(3d) Don will lead to a reference to the cloud() function in the lattice package. Don I don't remember if the lattice package is installed by default. it is, since it's recommended. In such a situation (as in quite a few others), please consider using packageDescription(lattice) Package: lattice Version: 0.12-5 Date: 2005/08/16 Priority: recommended ^^ Title: Lattice Graphics Author: Deepayan Sarkar [EMAIL PROTECTED] where I've added the and markup. Don If not, you will have to install it. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A. Mani : Tapply
__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] learning decision trees with one's own scoring functins
Hi netters, I want to learn a decision tree from a series of instances (learning data). The packages tree or rpart can do this quite well, but the scoring functions (splitting criteria) are fixed in these packages, like gini or something. However, I'm going to use another scoring function. At first I wanna modify the R code of tree or rpart and put my own scoring function in. But it seems that tree and rpart perform the splitting procedure by calling external C functions, which I have no access to. So do I have to write R code from scratch to build the tree with my own scoring functions? It's a really tough task. Or r there other R packages that can do similar things with more flexible and extensible code? Thanks a lot! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] learning decision trees with one's own scoring functins
Hello, You have access to the C code of the function in the *source* of the package. You can modify it and recompile the package and function (its better then to give a different name!). Best, Philippe Grosjean ..°})) ) ) ) ) ) ( ( ( ( (Prof. Philippe Grosjean ) ) ) ) ) ( ( ( ( (Numerical Ecology of Aquatic Systems ) ) ) ) ) Mons-Hainaut University, Pentagone (3D08) ( ( ( ( (Academie Universitaire Wallonie-Bruxelles ) ) ) ) ) 8, av du Champ de Mars, 7000 Mons, Belgium ( ( ( ( ( ) ) ) ) ) phone: + 32.65.37.34.97, fax: + 32.65.37.30.54 ( ( ( ( (email: [EMAIL PROTECTED] ) ) ) ) ) ( ( ( ( (web: http://www.umh.ac.be/~econum ) ) ) ) ) http://www.sciviews.org ( ( ( ( ( .. zhihua li wrote: Hi netters, I want to learn a decision tree from a series of instances (learning data). The packages tree or rpart can do this quite well, but the scoring functions (splitting criteria) are fixed in these packages, like gini or something. However, I'm going to use another scoring function. At first I wanna modify the R code of tree or rpart and put my own scoring function in. But it seems that tree and rpart perform the splitting procedure by calling external C functions, which I have no access to. So do I have to write R code from scratch to build the tree with my own scoring functions? It's a really tough task. Or r there other R packages that can do similar things with more flexible and extensible code? Thanks a lot! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Truncate levels to use randomForest
Hi, I will explain my problem with this example: library(randomForest) # load the iris plant data set dataset - iris numberarray - array(1:nrow(dataset), nrow(dataset), 1) # include only instances with Species = setosa or virginica indices - t(numberarray[(dataset$Species == setosa | dataset$Species == virginica) == TRUE]) finaldataset - dataset[indices,] # just to let you see the 3 classes levels(finaldataset$Species) # create the random forest randomForest(formula = Species ~ ., data = finaldataset, ntree = 5) # The error message I get Error in randomForest.default(m, y, ...) : Can't have empty classes in y. #The problem is that the finaldataset doesn't contain #any instances of versicolor, so I think the only way #to solve this problem is by changing the levels the #Species have to only setosa and virginica, # correct me if I'm wrong. # So I tried to change the levels but I got stuck: # get the possible unique classes uniqueItems - unique(levels(finaldataset$Species)) # the problem! newlevels - list(uniqueItems[1] = c(uniqueItems[1], uniqueItems[2]), uniqueItems[3] = uniqueItems[3]) # Error message Error: syntax error # In the help they use constant names to rename the #levels, so this works (but that's not what I want #because I don't want to change the code every time I #use another data set): newlevels - list(setosa = c(uniqueItems[1], uniqueItems[2]), virginica = uniqueItems[3]) levels(finaldataset$Species) - newlevels levels(finaldataset$Species) finaldataset$Species --- Thanks in advance, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] A. Mani : Tapply
Hi PLEASE do read the posting guide! ** On 25 Aug 2005 at 20:28, A Mani wrote: Hello, Is it safe to use tapply when the result will be of dim 2 x 1 or more ? In my PC R crashes. or gave you an error message? I tried this df-data.frame(A=as.factor(sample(1:1,10, replace=T)),B=as.factor(sample(10:11,10, replace=T)), num=rnorm(10)) ttt-tapply(df$num, list(df$A,df$B), diff) and got Error: cannot allocate vector of size 390664 Kb In addition: Warning messages: 1: Reached total allocation of 1000Mb: see help(memory.size) 2: Reached total allocation of 1000Mb: see help(memory.size) and the result with smaller data sets are df1-data.frame(A=as.factor(sample(1:2,10, replace=T)),B=as.factor(sample(10:11,10, replace=T)), num=rnorm(10)) ttt1-tapply(df1$num, list(df1$A,df1$B), diff) ttt1 1011 1 Numeric,24933 Numeric,25141 2 Numeric,24992 Numeric,24930 df-data.frame(A=as.factor(sample(1:1000,10, replace=T)),B=as.factor(sample(1:11000,10, replace=T)), num=rnorm(10)) ttt-tapply(df$num, list(df$A,df$B), diff) ttt[1:2,1:5] 1 10001 10002 10003 10004 1 NULL NULL NULL Numeric,0 NULL 2 NULL NULL NULL Numeric,0 NULL so you are probably receiving humonguous table of NULLs, zeros and few nonzero entries. You probably need to use different approach Cheers Petr The code used was on a 3-col data frame with two factor cols and a numeric column. The fn was diff . data form being A, B, Num tapply(data$A, list(data$A, data$B), diff) -- A. Mani Member, Cal. Math. Soc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Code in lattice::dotplot function.
Deepayan Sarkar wrote: On 8/25/05, ernesto [EMAIL PROTECTED] wrote: Hi, I'm trying to understand the code of lattice functions so that I can write some S4 methods using lattice. The following code is a snipet of dotplot that is reused in several other functions. I don't understand why this is needed can someone help ? It was a hack to enable usage of the form dotplot(x). The latest version of lattice does not use this sort of construct any more (replacing it by generics and methods). Deepayan OK, thanks. EJ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] chisq.,test`
On Fri, 2005-08-26 at 18:15 +1000, Stephen Choularton wrote: Hi I am trying to do this: I get no syntax error on R 2.1.1-patched. I do get an error though: Error in chisq.test(c(11, 13, 12, 18, 21, 43, 15, 12, 9, 10, 5, 28, 22, : probabilities must sum to 1. Which leads us to question the values you provided as argument p (which I assigned to a vector P first) sum(P) [1] 0.999783 So you need to make sure your probabilities add sum to one. HTH Ps. It would have been helpful if you'd posted the syntax error you received and your version of R - type version at the R prompt. Gav snip but I keep on getting syntax error. Is this to long, or somehow badly formed? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [T] +44 (0)20 7679 5522 ENSIS Research Fellow [F] +44 (0)20 7679 7565 ENSIS Ltd. ECRC [E] gavin.simpsonATNOSPAMucl.ac.uk UCL Department of Geography [W] http://www.ucl.ac.uk/~ucfagls/cv/ 26 Bedford Way[W] http://www.ucl.ac.uk/~ucfagls/ London. WC1H 0AP. %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] chisq.,test`
Hi I am trying to do this: chisq.test(c(11, 13, 12, 18, 21, 43, 15, 12, 9, 10, 5, 28, 22, 11, 15, 11, 18, 28, 16, 8, 15, 19, 44, 18, 11, 23, 15, 23, 2, 5, 4, 14, 3, 22, 9, 0, 6, 19, 15, 32, 3, 16, 14, 10, 24, 16, 24, 31, 29, 28, 16, 26, 11, 11, 4, 17, 16, 13, 20, 26, 16, 19, 34, 19, 17, 14, 22, 25, 17, 12, 23, 14, 19, 30, 18, 10, 23, 21, 17, 16, 10, 14, 6, 17, 17, 10, 21, 25, 20, 4, 11, 4, 11, 12, 21, 13, 12, 21, 8, 20, 8, 13, 18, 24, 19, 23, 17, 17, 10, 12, 5, 16, 1, 8, 14, 39, 13, 6, 6, 23, 11, 23, 21, 16, 18, 12, 16, 6, 8, 5, 9, 10, 9, 11, 20, 8, 12, 0, 14, 24, 30, 17, 17, 18, 43, 14, 36, 15, 20, 10, 19, 14, 19, 28, 21, 18, 24, 16, 13, 16, 11, 24, 28, 14, 23, 11, 19, 25, 26, 34, 16, 9, 18, 13, 27, 14, 16, 21, 10, 18, 23, 16, 39, 16, 35, 24, 32, 23, 19, 28, 19, 21, 32, 16, 21, 54, 23, 42, 33, 27, 19, 26, 22, 22, 19, 19, 19, 45, 14, 41, 15, 18, 27, 25, 23, 67, 7, 12, 26, 25, 26, 22, 12, 125, 70, 48, 17, 20, 34, 15, 25, 45, 32, 42, 29, 107, 14, 17, 39, 34, 45, 65, 46, 27, 31, 28, 26, 27, 24, 39, 26, 16, 28, 28, 17, 32, 32, 35, 25, 44, 21, 20, 21, 32, 30, 29, 18, 86, 32, 56, 30, 31, 35, 17, 19, 27, 22, 23, 44, 15, 18, 34, 32, 21, 32, 22, 19, 34, 21, 8, 19, 32, 31, 35, 19, 16, 17, 36, 39, 29, 31, 42, 30, 42, 22, 25, 20, 52, 21, 36, 27, 13, 27, 118, 18, 12, 19, 16, 60, 22, 25, 22, 24, 11, 53, 1, 27, 34, 39, 43, 22, 21, 36, 43, 20, 49, 45, 34, 95, 10, 13, 27, 21, 20, 39, 26, 27, 23, 24, 20, 21, 14, 26, 31, 31, 23, 24, 19, 52, 50, 26, 16, 78, 29, 34, 15, 59, 20, 43, 10, 29, 41, 23, 26, 26, 27, 8, 35, 34, 16, 42, 31, 35, 36, 33, 43, 19, 24, 21, 24, 18, 32, 33, 35, 17, 13, 25, 34, 7, 54, 34, 92, 27, 24, 19, 13, 16, 22, 18, 15, 19, 17, 31, 14, 32), p=c(0.0016, 0.002752, 0.001728, 0.0016, 0.001792, 0.005953, 0.006081, 0.001792, 0.002432, 0.0016, 0.001984, 0.002624, 0.001984, 0.002432, 0.001728, 0.002304, 0.002176, 0.003008, 0.001792, 0.0016, 0.0016, 0.001664, 0.0032, 0.003328, 0.0016, 0.002112, 0.001472, 0.001728, 0.001728, 0.002432, 0.001856, 0.0016, 0.001536, 0.002624, 0.003072, 0.001664, 0.002624, 0.003264, 0.004417, 0.002816, 0.001536, 0.00288, 0.002048, 0.002368, 0.002624, 0.002048, 0.003008, 0.002432, 0.003008, 0.003584, 0.001856, 0.00256, 0.001728, 0.001856, 0.002048, 0.00288, 0.001984, 0.0016, 0.001728, 0.002432, 0.001472, 0.001792, 0.003328, 0.003136, 0.00256, 0.002496, 0.00192, 0.002048, 0.00192, 0.001792, 0.002432, 0.0016, 0.001728, 0.002624, 0.0016, 0.001728, 0.001728, 0.001664, 0.002304, 0.001472, 0.002176, 0.0016, 0.000512, 0.001792, 0.001856, 0.002432, 0.0016, 0.001856, 0.001856, 0.001664, 0.001856, 0.00192, 0.00128, 0.001664, 0.00256, 0.001728, 0.001856, 0.002176, 0.001472, 0.001728, 0.001664, 0.001408, 0.001728, 0.002688, 0.00224, 0.004289, 0.001728, 0.001408, 0.001984, 0.001344, 0.001664, 0.001024, 0.001984, 0.001536, 0.0016, 0.008066, 0.001152, 0.001472, 0.001536, 0.002048, 0.001792, 0.003969, 0.002432, 0.00192, 0.003008, 0.003008, 0.00224, 0.001856, 0.001344, 0.0016, 0.00224, 0.002176, 0.001664, 0.0016, 0.001664, 0.001664, 0.001664, 0.00128, 0.002048, 0.002432, 0.002944, 0.001728, 0.00192, 0.001984, 0.003328, 0.001536, 0.002624, 0.001984, 0.001728, 0.002176, 0.00192, 0.001856, 0.001984, 0.00224, 0.001856, 0.0016, 0.001792, 0.001344, 0.001728, 0.001856, 0.001536, 0.001792, 0.00256, 0.002112, 0.002816, 0.001792, 0.001728, 0.001984, 0.002048, 0.002752, 0.001984, 0.001728, 0.001536, 0.002432, 0.00224, 0.001664, 0.001664, 0.001856, 0.0016, 0.002176, 0.001792, 0.001472, 0.003648, 0.00192, 0.003264, 0.001792, 0.003136, 0.00224, 0.002112, 0.002368, 0.001664, 0.001792, 0.003328, 0.002368, 0.001792, 0.007553, 0.002944, 0.004033, 0.002752, 0.002944, 0.001856, 0.002304, 0.00224, 0.001728, 0.0016, 0.001344, 0.002368, 0.003392, 0.001856, 0.003584, 0.001408, 0.001984, 0.003136, 0.00192, 0.00288, 0.005633, 0.001408, 0.001664, 0.00288, 0.0032, 0.002752, 0.002304, 0.001664, 0.00973, 0.005633, 0.003392, 0.001664, 0.001792, 0.002432, 0.002048, 0.001984, 0.003392, 0.002368, 0.004929, 0.002176, 0.012739, 0.001984, 0.001664, 0.00352, 0.004161, 0.004993, 0.004865, 0.004417, 0.002368, 0.0032, 0.001792, 0.002112, 0.00224, 0.0016, 0.003328, 0.002304, 0.001344, 0.001856, 0.003072, 0.001664, 0.002432, 0.002496, 0.002688, 0.002176, 0.003008, 0.001664, 0.001856, 0.0016, 0.002624, 0.002176, 0.002368, 0.001472, 0.008962, 0.002496, 0.003904, 0.002368, 0.002496, 0.002624, 0.001472, 0.002048, 0.002048, 0.001536, 0.002304, 0.003072, 0.001536, 0.0016, 0.002368, 0.002304, 0.00192, 0.002304, 0.0016, 0.001536, 0.003008, 0.0016, 0.001984, 0.002176, 0.004993, 0.002368, 0.002304, 0.001344, 0.002368, 0.001728, 0.003136, 0.004033, 0.00224, 0.002112, 0.002944, 0.001984, 0.003392, 0.002048, 0.001984, 0.001728, 0.003776, 0.001856, 0.002752, 0.002624, 0.0016, 0.002816, 0.012163, 0.00352, 0.001344, 0.001728, 0.001856, 0.005505, 0.001664, 0.001728, 0.0016, 0.001664, 0.002112, 0.00384, 0.001792, 0.00192, 0.002368, 0.002752, 0.004097, 0.0016, 0.001536, 0.00256, 0.003264, 0.002112, 0.004353, 0.003264, 0.00224, 0.00845, 0.001408, 0.001536,
Re: [R] histogram method for S4 class.
Deepayan Sarkar wrote: On 8/24/05, ernesto [EMAIL PROTECTED] wrote: Hi, I'm trying to develop an histogram method for a class called FLQuant which is used by the package FLCore (http://flr-project.org). FLQuant is an extension to array. There is an as.data.frame method that coerces flquant into a data.frame suitable for lattice plotting. The problem is that when I coerce the object and plot it after it works but if the method is applied within the histogram method it does not work. See the code below (the FLCore package is here http://prdownloads.sourceforge.net/flr/FLCore_1.0-1.tar.gz?download) library(FLCore) Loading required package: lattice data(ple4) histogram(~data|year, [EMAIL PROTECTED]) Error in inherits(x, factor) : Object x not found histogram(~data|year, data=as.data.frame([EMAIL PROTECTED])) The catch.n slot is a FLQuant object and the code for histogram is the following setMethod(histogram, signature(formula=formula, data=FLQuant), function (formula, data = parent.frame(), allow.multiple = is.null(groups) || outer, outer = FALSE, auto.key = FALSE, aspect = fill, panel = panel.histogram, prepanel = NULL, scales = list(), strip = TRUE, groups = NULL, xlab, xlim, ylab, ylim, type = c(percent, count, density), nint = if (is.factor(x)) length(levels(x)) else round(log2(length(x)) + 1), endpoints = extend.limits(range(x[!is.na(x)]), prop = 0.04), breaks = if (is.factor(x)) seq(0.5, length = length(levels(x)) + 1) else do.breaks(endpoints, nint), equal.widths = TRUE, drop.unused.levels = lattice.getOption(drop.unused.levels), ..., default.scales = list(), subscripts = !is.null(groups), subset = TRUE) { qdf - as.data.frame(data) histogram(formula, data = qdf, allow.multiple = allow.multiple, outer = outer, auto.key = auto.key, aspect = aspect, panel = panel, prepanel = prepanel, scales = scales, strip = strip, groups = groups, xlab=xlab, xlim=xlim, ylab=ylab, ylim=ylim, type = type, nint = nint, endpoints = endpoints, breaks = breaks, equal.widths = equal.widths, drop.unused.levels = drop.unused.levels, ..., default.scales = default.scales, subscripts = subscripts, subset = subset) } ) Any ideas ? [I'm CC-ing to r-devel, please post follow-ups there] What version of lattice are you using? Please use the latest one, in which histogram is an S3 generic, with only one argument, formula. The eventual solution to your problem may involve changing that, but the first question to ask is whether any other formula makes sense in your context (if not, I would rather keep one argument and dispatch on signature(formula = FLQuant). Disclaimer: I haven't actually had time to check out FLCore yet, I will as soon as I can. Deepayan Hi, I've installed the version that is distributed with R-2.1.1, 0.11-8. I see there's a new version now so I'll install it and check the results. I've developed the code a little more using the approach you use for dotplot (see below) and I know where the problem is now. I'm not able to pass the argument nint, breaks and endpoints to the function call. I guess the problem is my programming skils :-( Thanks EJ ps: I'm not a subscriber of r-devel so I guess I'm not able to post there, anyway I'm CC-ing there too. setMethod(histogram, signature(formula=formula, data=FLQuant), function (formula, data = parent.frame(), allow.multiple = is.null(groups) || outer, outer = FALSE, auto.key = FALSE, aspect = fill, panel = panel.histogram, prepanel = NULL, scales = list(), strip = TRUE, groups = NULL, xlab, xlim, ylab, ylim, type = c(percent, count, density), nint = if (is.factor(x)) length(levels(x)) else round(log2(length(x)) + 1), endpoints = extend.limits(range(x[!is.na(x)]), prop = 0.04), breaks = if (is.factor(x)) seq(0.5, length = length(levels(x)) + 1) else do.breaks(endpoints, nint), equal.widths = TRUE, drop.unused.levels = lattice.getOption(drop.unused.levels), ..., default.scales = list(), subscripts = !is.null(groups), subset = TRUE) { # need to develop further, at the moment is not possible to control nint, breaks and endpoints. data - as.data.frame(data) dots - list(...) groups - eval(substitute(groups), data, parent.frame()) subset - eval(substitute(subset), data, parent.frame()) call.list - c(list(formula = formula, data = data, groups = groups, subset = subset, allow.multiple = allow.multiple, outer = outer, auto.key = auto.key, aspect = aspect, panel = panel, prepanel = prepanel, scales = scales, strip = strip, type = type, equal.widths = equal.widths, drop.unused.levels = drop.unused.levels, default.scales = default.scales, subscripts = subscripts), dots) # include xlab co if existent if(!missing(xlab)) call.list$xlab - xlab if(!missing(ylab)) call.list$ylab - ylab if(!missing(xlim)) call.list$xlim - xlim if(!missing(ylim))
[R] Cumulative Function
I was wandering if anyone cold advise me on a good algorithm to turn a set of data from its original for into its cumulative form. I have written a piece of code that takes the data and does essentially what a histogram function would do, except add to the new bin the sum in the previous bin. Once that is done I divide by the frequency in the last bin plus 1. I know the ecdf() function exists in R, but I want to use it to fit the cumulative distributions to the data and ecdf() produces a non-subscriptable vector and so fitdistr() cannot be used on it. Thanks for any help you can give Mark Miller __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] chisq.,test`
On 26-Aug-05 Stephen Choularton wrote: Hi I am trying to do this: chisq.test(c(11, 13, 12, 18, 21, 43, 15, 12, 9, 10, 5, 28, 22, 11, 15, [...] 7, 54, 34, 92, 27, 24, 19, 13, 16, 22, 18, 15, 19, 17, 31, 14, 32), p=c(0.0016, 0.002752, 0.001728, 0.0016, 0.001792, 0.005953, 0.006081, [...] 0.001664, 0.002304)) but I keep on getting syntax error. Is this to long, or somehow badly formed? Having cutpasted your data as published in your email into R-1.8.0 on Red Hat Linux 9, I encountered no syntax error and got the result: data: c(11, 13, 12, 18, 21, 43, 15, 12, 9, 10, 5, 28, 22, 11, 15, [...] X-squared = 1094.448, df = 414, p-value = 2.2e-16 Fee: For each P-value of the form x.ye-N, N beers. Hoping this helps ... though your syntax error remains mysterious. Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 26-Aug-05 Time: 09:53:38 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cumulative Function
Will ?cumsum help? Sean On 8/26/05 5:15 AM, Mark Miller [EMAIL PROTECTED] wrote: I was wandering if anyone cold advise me on a good algorithm to turn a set of data from its original for into its cumulative form. I have written a piece of code that takes the data and does essentially what a histogram function would do, except add to the new bin the sum in the previous bin. Once that is done I divide by the frequency in the last bin plus 1. I know the ecdf() function exists in R, but I want to use it to fit the cumulative distributions to the data and ecdf() produces a non-subscriptable vector and so fitdistr() cannot be used on it. Thanks for any help you can give Mark Miller __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] learning decision trees with one's own scoring functins
Please do study the packages you mention a great deal more carefully before posting such negative remarks about them. In particular, rpart is already fully user-extensible (and comes with a worked example), and both packages are supplied in source code on CRAN. On Fri, 26 Aug 2005, zhihua li wrote: Hi netters, I want to learn a decision tree from a series of instances (learning data). The packages tree or rpart can do this quite well, but the scoring functions (splitting criteria) are fixed in these packages, like gini or something. However, I'm going to use another scoring function. At first I wanna modify the R code of tree or rpart and put my own scoring function in. But it seems that tree and rpart perform the splitting procedure by calling external C functions, which I have no access to. So do I have to write R code from scratch to build the tree with my own scoring functions? It's a really tough task. Or r there other R packages that can do similar things with more flexible and extensible code? Thanks a lot! -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Help: lda predict
Hello, I use lda (package: MASS) to obtain a lda object, then want to employ this object to do the prediction for the new data like below: predict(object, newdata, dimen=1, method=c(plug-in, predictive, debiased)) What is the exact difference among the three methods? What is the difference of prediction results when applying different method? Thank you, Shengzhe __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Modelling Financial Time Series with S-PLUS - Adv. Course 20th Sept '05
Insightful are now taking bookings for the Advanced Time Series Modelling course to be held at Carlton Terrace in London SW1 on 20th September. Advanced workshop Extract for Financial Time Series Modelling : The Advanced Time Series Course focuses on the most up to date theory and its application around the following topics (note that not all topics will be covered during the workshop) 1. The Econometrics of transaction level data oAutoregressive Conditional Duration Models oMeasuring liquidity and market pressure oAssessing predictability in tick by tick data- market depth and liquidity 2. Extracting the Implied Probability Distributions from Option Prices oCan options help predict the future spot market? 3. The Use of State Space models in Finance oMeasure herding and market sentiment 4. Factor Models of Asset Returns oStatistical and Economic Factors 5. GMM and EMM Estimation methods in SPLUS oContinuous time models in Finance oEstimating Stochastic Volatility Models 6. Copula Methods in Finance oBasic Copula testing and estimation oMeasuring joint risks oPricing Basket options Tutor: Mark Salmon, Professor of Finance, Warwick University Mark Salmon has many years' experience teaching and research in Financial econometrics; behavioural finance; asset pricing; knightian uncertainty; risk management and asset management. Formerly Deutsche Morgan Grenfell Professor of Financial Markets, Cass Business School, London, Mark is also an advisor to the Bank of England and has many links with City Institutions. For full course details please go to: http://www.insightful.com/services/training/timeseries_UK05.asp To Register: http://www.insightful.com/uk/services/register_uk.asp If you need any further information please do not hesitate to contact me. Regards Kathy Kiely, Kathy Kiely Sales Marketing Assistant Insightful Limited Network House, Basing View Basingstoke, Hampshire, RG21 4HG Tel : 01256 339822 Fax : 01256 339839 e mail : [EMAIL PROTECTED] === Beyond Excel(r): Quantitative Data Analysis with Insightful Date: Wednesday, August 31st at 4:00 PM BST Speakers: Will Alexander, Wachovia Securities David Smith, Insightful Corporation http://www.insightful.com/news_events/webcasts/2005/08excel/default.asp [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help: lda predict
On Fri, 26 Aug 2005, Shengzhe Wu wrote: I use lda (package: MASS) to obtain a lda object, then want to employ this object to do the prediction for the new data like below: predict(object, newdata, dimen=1, method=c(plug-in, predictive, debiased)) That is not how you call it: when a character vector is given like that those are alternatives. Do read the help page, as we ask. What is the exact difference among the three methods? What is the difference of prediction results when applying different method? This is stated on the help page. If you are unfamiliar with the area, note that the posting guide points out that MASS is support software for a book and the explanations are in the book. The help page also has references: please do read them (before posting). PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Matrix oriented computing
Hi, I want to compute the quantiles of Chi^2 distributions with different degrees of freedom like x-cbind(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-rbind(1:100) m-qchisq(x,df) and hoped to get back a length(df) times length(x) matrix with the quantiles. Since this does not work, I use x-c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-c(1:100) m-qchisq(x,df[1]) for (i in 2:length(df)) { m-rbind(m,qchisq(x,df[i])) } dim(m)-c(length(df),length(x)) Is there a way to avoid the for loop ? Thanks Sigbert __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Help: lda predict
Thanks for your reply. Actually I called function as below. p1 = predict(object, newdata, dimen=1) p2 = predict(object, newdata, dimen=1, method=debiased) p3 = predict(object, newdata, dimen=1, method=predictive) The MAP classification of prediction results by any method are the same. I know what the method plug-in and debiased mean, but what does the vague prior for the method predictive mean? what is vague here? Thank you, Shengzhe On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Fri, 26 Aug 2005, Shengzhe Wu wrote: I use lda (package: MASS) to obtain a lda object, then want to employ this object to do the prediction for the new data like below: predict(object, newdata, dimen=1, method=c(plug-in, predictive, debiased)) That is not how you call it: when a character vector is given like that those are alternatives. Do read the help page, as we ask. What is the exact difference among the three methods? What is the difference of prediction results when applying different method? This is stated on the help page. If you are unfamiliar with the area, note that the posting guide points out that MASS is support software for a book and the explanations are in the book. The help page also has references: please do read them (before posting). PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
On Fri, 26 Aug 2005 14:44:10 +0200 Sigbert Klinke wrote: Hi, I want to compute the quantiles of Chi^2 distributions with different degrees of freedom like x-cbind(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-rbind(1:100) m-qchisq(x,df) and hoped to get back a length(df) times length(x) matrix with the quantiles. Since this does not work, I use x-c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-c(1:100) m-qchisq(x,df[1]) for (i in 2:length(df)) { m-rbind(m,qchisq(x,df[i])) } dim(m)-c(length(df),length(x)) Is there a way to avoid the for loop ? You could use sapply(): m - sapply(x, function(x) qchisq(x, df)) hth, Z Thanks Sigbert __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
On Fri, 2005-08-26 at 14:44 +0200, Sigbert Klinke wrote: Hi, I want to compute the quantiles of Chi^2 distributions with different degrees of freedom like x-cbind(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-rbind(1:100) m-qchisq(x,df) and hoped to get back a length(df) times length(x) matrix with the quantiles. Since this does not work, I use x-c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-c(1:100) m-qchisq(x,df[1]) for (i in 2:length(df)) { m-rbind(m,qchisq(x,df[i])) } dim(m)-c(length(df),length(x)) Is there a way to avoid the for loop ? Thanks Sigbert See ?sapply x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help: lda predict
On Fri, 26 Aug 2005, Shengzhe Wu wrote: Thanks for your reply. Actually I called function as below. p1 = predict(object, newdata, dimen=1) p2 = predict(object, newdata, dimen=1, method=debiased) p3 = predict(object, newdata, dimen=1, method=predictive) So why did you say something different? The MAP classification of prediction results by any method are the same. I know what the method plug-in and debiased mean, but what does the vague prior for the method predictive mean? what is vague here? Please do as we ask, and read the book for which this is supporting material (on p.339, to save you looking in the index). Thank you, Shengzhe On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Fri, 26 Aug 2005, Shengzhe Wu wrote: I use lda (package: MASS) to obtain a lda object, then want to employ this object to do the prediction for the new data like below: predict(object, newdata, dimen=1, method=c(plug-in, predictive, debiased)) That is not how you call it: when a character vector is given like that those are alternatives. Do read the help page, as we ask. What is the exact difference among the three methods? What is the difference of prediction results when applying different method? This is stated on the help page. If you are unfamiliar with the area, note that the posting guide points out that MASS is support software for a book and the explanations are in the book. The help page also has references: please do read them (before posting). PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
I believe that the following is what you want: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) dof - 1:100 ans - outer(x, dof, qchisq) dimnames(ans) - list(x, dof) Note that 'df' is not a very auspicious name for an object since it is the name of a function. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Sigbert Klinke wrote: Hi, I want to compute the quantiles of Chi^2 distributions with different degrees of freedom like x-cbind(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-rbind(1:100) m-qchisq(x,df) and hoped to get back a length(df) times length(x) matrix with the quantiles. Since this does not work, I use x-c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df-c(1:100) m-qchisq(x,df[1]) for (i in 2:length(df)) { m-rbind(m,qchisq(x,df[i])) } dim(m)-c(length(df),length(x)) Is there a way to avoid the for loop ? Thanks Sigbert __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help: lda predict
I compared posterior of these three prediction results, they are a little different. The book you mentioned should be Modern Applied Statistics with S. 4th edition. But this book has been borrowed out from our univeristy library by someone else, and I have checked the book Pattern Recognition and Neural Networks which does not mention these three lda prediction methods. Thanks you, Shengzhe On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Fri, 26 Aug 2005, Shengzhe Wu wrote: Thanks for your reply. Actually I called function as below. p1 = predict(object, newdata, dimen=1) p2 = predict(object, newdata, dimen=1, method=debiased) p3 = predict(object, newdata, dimen=1, method=predictive) So why did you say something different? The MAP classification of prediction results by any method are the same. I know what the method plug-in and debiased mean, but what does the vague prior for the method predictive mean? what is vague here? Please do as we ask, and read the book for which this is supporting material (on p.339, to save you looking in the index). Thank you, Shengzhe On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Fri, 26 Aug 2005, Shengzhe Wu wrote: I use lda (package: MASS) to obtain a lda object, then want to employ this object to do the prediction for the new data like below: predict(object, newdata, dimen=1, method=c(plug-in, predictive, debiased)) That is not how you call it: when a character vector is given like that those are alternatives. Do read the help page, as we ask. What is the exact difference among the three methods? What is the difference of prediction results when applying different method? This is stated on the help page. If you are unfamiliar with the area, note that the posting guide points out that MASS is support software for a book and the explanations are in the book. The help page also has references: please do read them (before posting). PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help: lda predict
On Fri, 26 Aug 2005, Shengzhe Wu wrote: I compared posterior of these three prediction results, they are a little different. The book you mentioned should be Modern Applied Statistics with S. 4th edition. But this book has been borrowed out from our univeristy library by someone else, So please do request it and consult it, as you are using its support software. Note that the posting guide does asks you to mention if you have no access to the references. and I have checked the book Pattern Recognition and Neural Networks which does not mention these three lda prediction methods. It does, in detail, in sections 2.4 and 2.5. Thanks you, Shengzhe On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Fri, 26 Aug 2005, Shengzhe Wu wrote: Thanks for your reply. Actually I called function as below. p1 = predict(object, newdata, dimen=1) p2 = predict(object, newdata, dimen=1, method=debiased) p3 = predict(object, newdata, dimen=1, method=predictive) So why did you say something different? The MAP classification of prediction results by any method are the same. I know what the method plug-in and debiased mean, but what does the vague prior for the method predictive mean? what is vague here? Please do as we ask, and read the book for which this is supporting material (on p.339, to save you looking in the index). Thank you, Shengzhe On 8/26/05, Prof Brian Ripley [EMAIL PROTECTED] wrote: On Fri, 26 Aug 2005, Shengzhe Wu wrote: I use lda (package: MASS) to obtain a lda object, then want to employ this object to do the prediction for the new data like below: predict(object, newdata, dimen=1, method=c(plug-in, predictive, debiased)) That is not how you call it: when a character vector is given like that those are alternatives. Do read the help page, as we ask. What is the exact difference among the three methods? What is the difference of prediction results when applying different method? This is stated on the help page. If you are unfamiliar with the area, note that the posting guide points out that MASS is support software for a book and the explanations are in the book. The help page also has references: please do read them (before posting). PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] covariance matrix under null
On Thu, 25 Aug 2005, Devarajan, Karthik wrote: Hello I am fitting a Cox PH model using the function coxph(). Does anyone know how to obtain the estimate of the covariance matrix under the null hypothesis. The function coxph.detail() does not seem to be useful for this purpose. You can evaluate the second derivative of the partial loglikelihood at any specified beta with vcov(coxph(formula, data,iter=0, start, init=beta) eg if you want to get score tests. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Digest, Vol 30, Issue 26
Dear R helpers, For me ( i.e. R 2.1.1 on Mac OS X), using trellis.device (postscript, onefile = F, etc ... with the lattice library within a R function works fine to obtain the desired graph as an EPS file , provided that : 1) the command dev.off() is not included in this function 2) and it is issued at the command level after the function has been exited I would like to know if there is a way to close the EPS file within the function itself, freeing the user to issue the closing command (I already tried trellis.device (), and trellis.device (null) without any success). Regards, J.-M. Jean-Marc Ottorini LERFoB, UMR INRA-ENGREF 1092 email [EMAIL PROTECTED] INRA - Centre de Nancy voice +33-0383-394046F54280 - Champenoux fax+33-0383-394034 France __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Shutting down a trellis plot (was R-help Digest, Vol 30, Issue 26)
I suspect you have not print()-ed your graphics, see FAQ Q7.22. It is then possible to include dev.off() within the function. E.g. testit - function(fn = test.eps) { trellis.device(postscript, file=fn, onefile = FALSE, horizontal=FALSE) print(stripplot(voice.part ~ jitter(height), data = singer, aspect = 1, jitter = TRUE, xlab = Height (inches))) dev.off() } testit() works for me. On Fri, 26 Aug 2005, Jean-Marc Ottorini wrote: For me ( i.e. R 2.1.1 on Mac OS X), using trellis.device (postscript, onefile = F, etc ... with the lattice library within a R function works fine to obtain the desired graph as an EPS file , provided that : 1) the command dev.off() is not included in this function 2) and it is issued at the command level after the function has been exited I would like to know if there is a way to close the EPS file within the function itself, freeing the user to issue the closing command (I already tried trellis.device (), and trellis.device (null) without any success). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] passing arguments from nnet to optim
Hi everyone, According to R reference manual, the nnet function uses the BFGS method of optim to optimize the neural network parameters. I would like, when calling the function nnet to tell the optim function not to produce the tracing information on the progress of the optimization, or at least to reduce the frequency of the reports. I tried the following: a) nnet default x-rnorm(20) y-seq(0,1,length=20) s-nnet(y~x,size=1) # weights: 4 initial value 1.910932 iter 10 value 1.819382 iter 20 value 1.788736 iter 30 value 1.775778 iter 40 value 1.767771 iter 50 value 1.765063 iter 60 value 1.762631 iter 70 value 1.760670 iter 80 value 1.759349 iter 90 value 1.757801 iter 100 value 1.756290 final value 1.756290 stopped after 100 iterations Report is generated at every 10 iterations. b) passing the REPORT parameter to optim via the control argument x-rnorm(20) y-seq(0,1,length=20) s-nnet(y~x,size=1,control=list(REPORT=50)) # weights: 4 initial value 1.894905 iter 10 value 1.672337 iter 20 value 1.658612 iter 30 value 1.654824 iter 40 value 1.653465 iter 50 value 1.652785 iter 60 value 1.652343 iter 70 value 1.652116 iter 80 value 1.651860 iter 90 value 1.651525 iter 100 value 1.651292 final value 1.651292 stopped after 100 iterations Is still producing reports at each 10 iterations. Has anyone an idea how can I turn off the report generation or at least to reduce its frequency? Thanks, Adi L. TARCA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Plotting nls
To get nice looking plots you can use trellis plots from the lattice package. First you need: library(lattice) Then you can define a custom panel function that will overlay the fitted curve on top of the data points in a different color (you just need to do this once; the fit you want plotted is specified as an argument): pred.overlay.panel - function(x, y, fit, ...) { panel.grid() panel.xyplot(x, y, ...) form - as.list(sys.call(-2))[[2]]$call$formula resp - deparse(form[[2]]) covar - deparse(form[[3]]) xx - seq(min(x), max(x), len=101) newdat - data.frame(xx) colnames(newdat) - covar panel.superpose(xx, predict(fit, newdata=newdat), subscripts=1:length(xx), groups=factor(rep(2, length(xx)), levels=1:2), type=l, ...) } Finally, you use the custom panel function in a call to xyplot: xyplot(y ~ x, data=sample, panel=pred.overlay.panel, fit=fit, scales=list(x=list(log=TRUE))) Note how you specify that you want the x-axis to be in log-scale with the scales parameter. Hope this helps. Ben On 8/26/05, Lanre Okusanya [EMAIL PROTECTED] wrote: Kindly excuse a non-statistician newbie attempting to wrestle with R. This might be a relatively easy question, but I am trying to perform nls regression and plot the fitted function through the data superimposed on the raw data. from reading the R-help, Rtips et al, I am only able to do that by extracting the parameter values manually and using it to create the plot. Is there an easier way to do this, (I have ~60 Plots), obtain an r^2, and also plot the x axis in the log domain (any attempts I have tried have screwed up). NLS script fit- nls(y~-emax*x^h/(ec50^h+x^h), data= sample, start=list(emax=4,h=2,ec50=1)) summary(fit) Thank you all for your help Lanre Okusanya, Pharm.D.,BCPS UB/Pfizer Pharmacometrics Fellow University at Buffalo School of Pharmacy and Pharmaceutical Sciences 237 Cooke Hall Buffalo, NY 14260 Email: [EMAIL PROTECTED] Tel: (716)645-2828 x 275 Fax: (716)645-2886 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] basic anova and t-test question
Hello, I'm posting this to receive some comments/hints about a rather statistical than R-technical question ... . In an anova of a lme factor SSPos11 shows up non-significant, but in the t-test of the summay 2 of the 4 levels (one for constrast) are significant. See below for some truncated output. I realize that the two test are different (F-test/t-test), but I'm looking for for a meaning. Maye you have a schenario that explains how these differences can be created and how you'd go ahead and analyse it further. When I use SSPos11 as te only fixed effect, it does it is not significant in either anova nor t-test, and a boxplot of the factor shows that the levels are all quite similar (similar variance and mean). Might the effect I observe be linked to an unbalance design in the multifactorial model? thanks a lot for your help, +kind regards, Arne anova(fit) numDF denDF F-value p-value (Intercept) 1 540 323.4442 .0001 SSPos1 3 540 15.1206 .0001 ... SSPos11 3 540 1.1902 0.3128 ... summary(fit) Linear mixed-effects model fit by REML Data: d.orig AIC BIClogLik 1007.066 1153.168 -469.5329 Random effects: Formula: ~1 | Method (Intercept) Residual StdDev: 0.4000478 0.4943817 Fixed effects: log(value + 7.5) ~ SSPos1 + SSPos2 + SSPos6 + SSPos7 + SSPos10 + SSPos11 + SSPos13 + SSPos14 + SSPos18 + SSPos19 + Value Std.Error DF t-value p-value (Intercept) 2.8621811 0.23125065 540 12.376964 0. SSPos1C -0.1647937 0.06293993 540 -2.618269 0.0091 SSPos1G -0.3448095 0.05922479 540 -5.822047 0. SSPos1T 0.1083988 0.06087095 540 1.780797 0.0755 ... SSPos11C -0.1540292 0.06171635 540 -2.495761 0.0129 SSPos11G -0.1428980 0.05993122 540 -2.384368 0.0175 SSPos11T -0.0039434 0.06133920 540 -0.064289 0.9488 ... [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote: Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. What I find interesting, is that I would have intuitively expected outer() to be faster than sapply(). However: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.01 0.00 0.02 0.00 0.00 # Bear in mind the round() on mat1 above all.equal(mat, mat1) [1] Mean relative difference: 4.905485e-05 all.equal(mat, t(mat2)) [1] TRUE Even when increasing the size of 'df' to 1:1000: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.16 0.01 0.16 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.16 0.00 0.18 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.16 0.01 0.17 0.00 0.00 It also seems that, at least in this case, t() and round() do not add much overhead. Best regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] passing arguments from nnet to optim
On Fri, 26 Aug 2005, Tarca, Adi wrote: Hi everyone, According to R reference manual, the nnet function uses the BFGS method of optim to optimize the neural network parameters. What the help page says is ...: arguments passed to or from other methods. That means methods of nnet(). Optimization is done via the BFGS method of 'optim'. but it is not calling optim, rather the C code implementing optim. I would like, when calling the function nnet to tell the optim function not to produce the tracing information on the progress of the optimization, or at least to reduce the frequency of the reports. I tried the following: a) nnet default x-rnorm(20) y-seq(0,1,length=20) s-nnet(y~x,size=1) # weights: 4 initial value 1.910932 iter 10 value 1.819382 iter 20 value 1.788736 iter 30 value 1.775778 iter 40 value 1.767771 iter 50 value 1.765063 iter 60 value 1.762631 iter 70 value 1.760670 iter 80 value 1.759349 iter 90 value 1.757801 iter 100 value 1.756290 final value 1.756290 stopped after 100 iterations Report is generated at every 10 iterations. b) passing the REPORT parameter to optim via the control argument x-rnorm(20) y-seq(0,1,length=20) s-nnet(y~x,size=1,control=list(REPORT=50)) # weights: 4 initial value 1.894905 iter 10 value 1.672337 iter 20 value 1.658612 iter 30 value 1.654824 iter 40 value 1.653465 iter 50 value 1.652785 iter 60 value 1.652343 iter 70 value 1.652116 iter 80 value 1.651860 iter 90 value 1.651525 iter 100 value 1.651292 final value 1.651292 stopped after 100 iterations Is still producing reports at each 10 iterations. Has anyone an idea how can I turn off the report generation or at least to reduce its frequency? You do it via the C code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] problem with certain data sets when using randomForest
Hi, Since I've had no replies on my previous post about my problem I am posting it again in the hope someone notice it. The problem is that the randomForest function doesn't take datasets which has instances only containing a subset of all the classes. So the dataset with instances that either belong to class a or b from the levels a, b and c doesn't work because there is no instance that has class c. Is there any way to solve this problem? library(randomForest) # load the iris plant data set dataset - iris numberarray - array(1:nrow(dataset), nrow(dataset), 1) # include only instances with Species = setosa or virginica indices - t(numberarray[(dataset$Species == setosa | dataset$Species == virginica) == TRUE]) finaldataset - dataset[indices,] # just to let you see the 3 classes levels(finaldataset$Species) # create the random forest randomForest(formula = Species ~ ., data = finaldataset, ntree = 5) # The error message I get Error in randomForest.default(m, y, ...) : Can't have empty classes in y. #The problem is that the finaldataset doesn't contain #any instances of versicolor, so I think the only way #to solve this problem is by changing the levels the #Species have to only setosa and virginica, # correct me if I'm wrong. # So I tried to change the levels but I got stuck: # get the possible unique classes uniqueItems - unique(levels(finaldataset$Species)) # the problem! newlevels - list(uniqueItems[1] = c(uniqueItems[1], uniqueItems[2]), uniqueItems[3] = uniqueItems[3]) # Error message Error: syntax error # In the help they use constant names to rename the #levels, so this works (but that's not what I want #because I don't want to change the code every time I #use another data set): newlevels - list(setosa = c(uniqueItems[1], uniqueItems[2]), virginica = uniqueItems[3]) levels(finaldataset$Species) - newlevels levels(finaldataset$Species) finaldataset$Species --- Thanks in advance, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
Try profiling. Doing this many times to get an overview, e.g. for sapply with df=1:1000: % self% total self secondstotalsecondsname 98.26 6.78 98.26 6.78 FUN 0.58 0.04 0.58 0.04 unlist 0.29 0.02 0.87 0.06 as.vector 0.29 0.02 0.58 0.04 names- 0.29 0.02 0.29 0.02 names-.default 0.29 0.02 0.29 0.02 names so almost all the time is in qchisq. On Fri, 26 Aug 2005, Marc Schwartz (via MN) wrote: On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote: Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. What I find interesting, is that I would have intuitively expected outer() to be faster than sapply(). However: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.01 0.00 0.02 0.00 0.00 # Bear in mind the round() on mat1 above all.equal(mat, mat1) [1] Mean relative difference: 4.905485e-05 all.equal(mat, t(mat2)) [1] TRUE Even when increasing the size of 'df' to 1:1000: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.16 0.01 0.16 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.16 0.00 0.18 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.16 0.01 0.17 0.00 0.00 It also seems that, at least in this case, t() and round() do not add much overhead. Definitely not for such small matrices. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
Dear Mark, For that matter, the loop isn't a whole a slower (on my 3GHz Win XP system): x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - 1:1000 system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.08 0.00 0.08 NA NA mat - matrix(0, 1000, 11) system.time(for (i in 1:length(df)) mat[i,] - qchisq(x, df[i])) [1] 0.09 0.00 0.10 NA NA Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz (via MN) Sent: Friday, August 26, 2005 10:26 AM To: Peter Dalgaard Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Matrix oriented computing On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote: Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. What I find interesting, is that I would have intuitively expected outer() to be faster than sapply(). However: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.01 0.00 0.02 0.00 0.00 # Bear in mind the round() on mat1 above all.equal(mat, mat1) [1] Mean relative difference: 4.905485e-05 all.equal(mat, t(mat2)) [1] TRUE Even when increasing the size of 'df' to 1:1000: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.16 0.01 0.16 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.16 0.00 0.18 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.16 0.01 0.17 0.00 0.00 It also seems that, at least in this case, t() and round() do not add much overhead. Best regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
Prof. Ripley, Excellent point. Neither sapply() nor outer() are the elephant in the room in this situation. On Fri, 2005-08-26 at 16:55 +0100, Prof Brian Ripley wrote: Try profiling. Doing this many times to get an overview, e.g. for sapply with df=1:1000: % self% total self secondstotalsecondsname 98.26 6.78 98.26 6.78 FUN 0.58 0.04 0.58 0.04 unlist 0.29 0.02 0.87 0.06 as.vector 0.29 0.02 0.58 0.04 names- 0.29 0.02 0.29 0.02 names-.default 0.29 0.02 0.29 0.02 names so almost all the time is in qchisq. On Fri, 26 Aug 2005, Marc Schwartz (via MN) wrote: On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote: Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. What I find interesting, is that I would have intuitively expected outer() to be faster than sapply(). However: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.01 0.00 0.02 0.00 0.00 # Bear in mind the round() on mat1 above all.equal(mat, mat1) [1] Mean relative difference: 4.905485e-05 all.equal(mat, t(mat2)) [1] TRUE Even when increasing the size of 'df' to 1:1000: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.16 0.01 0.16 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.16 0.00 0.18 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.16 0.01 0.17 0.00 0.00 It also seems that, at least in this case, t() and round() do not add much overhead. Definitely not for such small matrices. True and both are C functions, which of course helps as well. Best regards, Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] parts of data frames: subset vs. [-c()]
Dear all I have a problem with splitting up a data frame called ReVerb: » str(ReVerb) `data.frame': 92713 obs. of 16 variables: $ CHILD: Factor w/ 7 levels ABE,ADA,EVE,..: 1 1 1 1 1 1 1 1 1 1 ... $ AGE : Factor w/ 484 levels 1;06.00,1;06.16,..: 43 43 43 99 99 99 99 99 99 99 ... $ AGE_Q: num 2.0 2.0 2.0 2.4 2.4 ... $ INTERVALS: num 2 2 2 2.25 2.25 2.25 2.25 2.25 2.25 2.25 ... $ RND : int 34368 38311 14949 20586 72516 27186 88019 10767 114448 86146 ... $ SYNTAX : Factor w/ 17 levels Acmp,Amats,..: 15 12 8 15 7 16 7 7 16 7 ... $ LEXICAL : Factor w/ 1643 levels $ACHE,$ACT,..: 194 803 803 294 299 803 1562 299 679 1562 ... $ MORPH: Factor w/ 337 levels $,$ =inf,$ =prs,..: 9 20 9 39 184 231 57 67 231 39 ... $ COMPLEM : Factor w/ 1989 levels $,$ V PR=Lp [1.2],..: 203 547 220 203 1101 368 1834 1667 368 1834 ... $ MATRIX : Factor w/ 906 levels $ ???,$ be PR=Aen,..: 5 5 5 308 5 856 5 5 856 308 ... $ SITUATION: Factor w/ 9 levels [imitation of Mom: you know what I said],..: 2 2 2 2 2 2 2 2 2 2 ... $ V_ANN: int 1 1 1 4 4 4 4 3 3 3 ... $ QUEST: int 0 0 0 0 0 0 0 0 0 0 ... $ EXCL : int 0 0 0 1 1 1 1 0 0 0 ... $ U_LEN: int 3 4 5 13 13 13 13 8 8 8 ... $ UTTERANCE: Factor w/ 55113 levels ,# (be)cause he wanted to .,..: 5696 39091 52180 2262 2262 2262 2262 3593 3593 3593 ... The level causing the problem is SYNTAX: » as.data.frame(sort(table(SYNTAX))) sort(table(SYNTAX)) Particles 100 PR=N1 144 Amats 271 Trans_PR=A2 787 Ditrans 1181 Intrans_PR=A11399 Acmp 2402 Trans_PR=V2 2433 CPcmps 2769 Vpreps 4896 Intrans_V0 5182 Trans_PR=L2 7653 Trans_V028117 Intrans_PR=L18457 Intrans_V1 9643 Intrans_PR=V1 14987 Trans_V12 22288 I would like to extract all cases where SYNTAX==Ditrans from ReVerb, store that in a file, and then generate ReVerb again without these cases and factor levels. My problem is probably obvious from the following lines of code: » ditrans-which(SYNTAX==Ditrans) » ReVerb1-ReVerb[-c(ditrans),]; dim(ReVerb1) [1] 9153216 » » # ok, so the 92713-91532=1181 cases where SYNTAX==Ditrans have been removed, but ... » » ReVerb1-subset(ReVerb, SYNTAX!=Ditrans); dim(ReVerb1) [1] 9152816 » » # ... so why don't I get 91532 again as the number of rows? » Any ideas?? » R.version # on Windows XP with service Pack 2 _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major2 minor1.1 year 2005 month06 day 20 language R Thanks a lot, STG -- Stefan Th. Gries Max Planck Inst. for Evol. Anthropology http://people.freenet.de/Stefan_Th_Gries Machen Sie aus 14 Cent spielend bis zu 100 Euro! Die neue Gaming-Area von Arcor - über 50 Onlinespiele im Angebot. http://www.arcor.de/rd/emf-gaming-1 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] problem with certain data sets when using randomForest
Look at ?[.factor: finaldataset$Species - finaldataset$Species[,drop=TRUE] solves this. On Fri, 26 Aug 2005, Martin Lam wrote: Hi, Since I've had no replies on my previous post about my problem I am posting it again in the hope someone notice it. The problem is that the randomForest function doesn't take datasets which has instances only containing a subset of all the classes. So the dataset with instances that either belong to class a or b from the levels a, b and c doesn't work because there is no instance that has class c. Is there any way to solve this problem? library(randomForest) # load the iris plant data set dataset - iris numberarray - array(1:nrow(dataset), nrow(dataset), 1) # include only instances with Species = setosa or virginica indices - t(numberarray[(dataset$Species == setosa | dataset$Species == virginica) == TRUE]) finaldataset - dataset[indices,] # just to let you see the 3 classes levels(finaldataset$Species) # create the random forest randomForest(formula = Species ~ ., data = finaldataset, ntree = 5) # The error message I get Error in randomForest.default(m, y, ...) : Can't have empty classes in y. #The problem is that the finaldataset doesn't contain #any instances of versicolor, so I think the only way #to solve this problem is by changing the levels the #Species have to only setosa and virginica, # correct me if I'm wrong. # So I tried to change the levels but I got stuck: # get the possible unique classes uniqueItems - unique(levels(finaldataset$Species)) # the problem! newlevels - list(uniqueItems[1] = c(uniqueItems[1], uniqueItems[2]), uniqueItems[3] = uniqueItems[3]) # Error message Error: syntax error # In the help they use constant names to rename the #levels, so this works (but that's not what I want #because I don't want to change the code every time I #use another data set): newlevels - list(setosa = c(uniqueItems[1], uniqueItems[2]), virginica = uniqueItems[3]) levels(finaldataset$Species) - newlevels levels(finaldataset$Species) finaldataset$Species --- Thanks in advance, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Matrix oriented computing
On 8/26/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote: On Fri, 2005-08-26 at 15:25 +0200, Peter Dalgaard wrote: Marc Schwartz [EMAIL PROTECTED] writes: x - c(0.005, 0.010, 0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975, 0.99, 0.995) df - c(1:100) mat - sapply(x, qchisq, df) dim(mat) [1] 100 11 str(mat) num [1:100, 1:11] 3.93e-05 1.00e-02 7.17e-02 2.07e-01 4.12e-01 ... outer() is perhaps a more natural first try... It does give the transpose of the sapply approach, though. round(t(outer(x,df,qchisq)),2) should be close. You should likely add dimnames. What I find interesting, is that I would have intuitively expected outer() to be faster than sapply(). However: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.01 0.00 0.01 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.01 0.00 0.02 0.00 0.00 # Bear in mind the round() on mat1 above all.equal(mat, mat1) [1] Mean relative difference: 4.905485e-05 all.equal(mat, t(mat2)) [1] TRUE Even when increasing the size of 'df' to 1:1000: system.time(mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 0.16 0.01 0.16 0.00 0.00 system.time(mat1 - round(t(outer(x, df, qchisq)), 2), gcFirst = TRUE) [1] 0.16 0.00 0.18 0.00 0.00 # No round() or t() to test for overhead system.time(mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 0.16 0.01 0.17 0.00 0.00 It also seems that, at least in this case, t() and round() do not add much overhead. You might need to do it repeatedly to get a more reliable reading. When I do it 1000 times outer is faster than sapply though not by much: n - 1000 system.time(for (i in 1:n) mat - sapply(x, qchisq, df), gcFirst = TRUE) [1] 14.05 0.00 14.43NANA system.time(for(i in 1:n) mat2 - outer(x, df, qchisq), gcFirst = TRUE) [1] 13.42 0.00 13.85NANA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] parts of data frames: subset vs. [-c()]
Stefan Th. Gries [EMAIL PROTECTED] writes: Dear all I have a problem with splitting up a data frame called ReVerb: » str(ReVerb) `data.frame': 92713 obs. of 16 variables: $ CHILD: Factor w/ 7 levels ABE,ADA,EVE,..: 1 1 1 1 1 1 1 1 1 1 ... $ AGE : Factor w/ 484 levels 1;06.00,1;06.16,..: 43 43 43 99 99 99 99 99 99 99 ... $ AGE_Q: num 2.0 2.0 2.0 2.4 2.4 ... $ INTERVALS: num 2 2 2 2.25 2.25 2.25 2.25 2.25 2.25 2.25 ... $ RND : int 34368 38311 14949 20586 72516 27186 88019 10767 114448 86146 ... $ SYNTAX : Factor w/ 17 levels Acmp,Amats,..: 15 12 8 15 7 16 7 7 16 7 ... $ LEXICAL : Factor w/ 1643 levels $ACHE,$ACT,..: 194 803 803 294 299 803 1562 299 679 1562 ... $ MORPH: Factor w/ 337 levels $,$ =inf,$ =prs,..: 9 20 9 39 184 231 57 67 231 39 ... $ COMPLEM : Factor w/ 1989 levels $,$ V PR=Lp [1.2],..: 203 547 220 203 1101 368 1834 1667 368 1834 ... $ MATRIX : Factor w/ 906 levels $ ???,$ be PR=Aen,..: 5 5 5 308 5 856 5 5 856 308 ... $ SITUATION: Factor w/ 9 levels [imitation of Mom: you know what I said],..: 2 2 2 2 2 2 2 2 2 2 ... $ V_ANN: int 1 1 1 4 4 4 4 3 3 3 ... $ QUEST: int 0 0 0 0 0 0 0 0 0 0 ... $ EXCL : int 0 0 0 1 1 1 1 0 0 0 ... $ U_LEN: int 3 4 5 13 13 13 13 8 8 8 ... $ UTTERANCE: Factor w/ 55113 levels ,# (be)cause he wanted to .,..: 5696 39091 52180 2262 2262 2262 2262 3593 3593 3593 ... The level causing the problem is SYNTAX: » as.data.frame(sort(table(SYNTAX))) sort(table(SYNTAX)) Particles 100 PR=N1 144 Amats 271 Trans_PR=A2 787 Ditrans 1181 Intrans_PR=A11399 Acmp 2402 Trans_PR=V2 2433 CPcmps 2769 Vpreps 4896 Intrans_V0 5182 Trans_PR=L2 7653 Trans_V028117 Intrans_PR=L18457 Intrans_V1 9643 Intrans_PR=V1 14987 Trans_V12 22288 I would like to extract all cases where SYNTAX==Ditrans from ReVerb, store that in a file, and then generate ReVerb again without these cases and factor levels. My problem is probably obvious from the following lines of code: » ditrans-which(SYNTAX==Ditrans) » ReVerb1-ReVerb[-c(ditrans),]; dim(ReVerb1) [1] 9153216 » » # ok, so the 92713-91532=1181 cases where SYNTAX==Ditrans have been removed, but ... » » ReVerb1-subset(ReVerb, SYNTAX!=Ditrans); dim(ReVerb1) [1] 9152816 » » # ... so why don't I get 91532 again as the number of rows? » Any ideas?? The SYNTAX variable is not necessarily the same. Could you retry the first case with ditrans - which(ReVerb$SYNTAX==Ditrans) ? Otherwise, try doing a setdiff() on the rownames of the two discrepant results and see which are the four cases that differ. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] class 'named'
I'm working through the examples in Venables and Ripley in the 'New-style Classes' chapter. On a call to representation, in the lda example, it is unable to find the class named. Is the class named defined anywhere? I've loaded the library methods but this hasn't helped. Phineas Campbell __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Help in Compliling user -defined functions in Rpart
I have been trying to write my own user defined function in Rpart.I imitated the anova splitting rule which is given as an example.In the work I am doing ,I am calculating the concentration index(ci) ,which is in between -1 and +1.So my deviance is given by abs(ci)*(1-abs(ci)).Now when I run rpart incorporating this user defined function i get the following error message: Error in user.split(yback[1:nback], wback[1:nback], xback[1:nback], parms, : unused argument(s) ( ...) Now I am failing to indentify where I am going wrong (In case I am have made some mistake).So I was wondering if there is anybody who have written some user defined functions of theirs and maybe if there is any documentation with regards to user defined functions and examples. Regards , Luwis Diya #User defined function # temp.init-function(y,offset,parms,wt){ if (!is.null(offset)) y-y-offset if (is.matrix(y))stop (response must be a vector) list(y=y,parms=0,numy=1,numresp=1, summary=function(yval,dev,wt,ylevel,digits){ paste(mean=,format(signif(yval,digits)), MSE=,format(signif(dev/wt,digits)), sep='') }) } temp.eval-function(y,wt,parms){ n-length(y) r-wt for (i in 1:n-1) {r[i+1]=(sum(wt[1:i])+0.5*wt[i+1])/n} #fractional rank r[1]-0.5*wt[1]/n wmean-sum(y*wt)/sum(wt) ci-2*sum(wt*(y-wmean)*(r-0.5))/sum(wt*y) #concentration index for socio-economic inequality dev-abs(ci)*(1-abs(ci)) #deviance following the gini impurity approach list(label=wmean,deviance=dev) } temp.split-function(y,wt,parms,continous){ n-length(y) r-wt for (i in 1:n-1) {r[i+1]=(sum(wt[1:i])+0.5*wt[i+1])/n} r[1]-0.5*wt[1]/n wmean-sum(y*wt)/sum(wt) ci-2*sum(wt*(y-wmean)*(r-0.5))/sum(wt*y) devci-abs(ci)*(1-abs(ci)) if(continous){ lss-cumsum(wt*y)[-n] rss-sum(wt*y)-lss lw-cumsum(wt)[-n] rw-sum(wt)-lw lm-lss/lw rm-rss/rw lcss-cumsum(wt[1:length(lm)]*(y[1:length(lm)]-lm)*(r[1:length(lm)]-0.5)) rcss-sum(wt*(y-wmean)*(r-0.5))-lcss lci-2*lcss/lss #concentration index for left side rci-2*rcss/rss #concentration index for right side devlci-abs(lci)*(1-abs(lci)) #deviance for left side devrci-abs(rci)*(1-abs(rci)) #deviance for right side goodness-devci-(lw/sum(wt))*devlci-(rw/sum(wt))*devrci list(goodness=goodness, direction=sign(lci)) } else { ux-sort(unique(x)) wtsum-tapply(wt,x,sum) ysum-tapply(wt*y,x,sum) means-ysum/wtsum ord-order(means) n-length(ord) lss-cumsum(ysum[ord])[-n] rss-sum(ysum)-lss lw-cumsum(wtsum[ord])[-n] rw-sum(wtsum)-lw lm-lss/lw rm-rss/rw lysum-tapply(wt*(y-lm)*(r-0.5),x,sum) lcss-cumsum(lysum[ord])[-n] rcss-sum(lysum)-lcss lci-2*lcss/lss rci-2*rcss/rss devlci-abs(lci)*(1-abs(lci)) devrci-abs(rci)*(1-abs(rci)) goodness-devci-0.5*(lw/sum(wt))*devlci-0.5*(rw/sum(wt))*devrci list(goodness=goodness, direction=sign(lci)) } } alist-list(eval=temp.eval,split=temp.split,init=temp.init) tree-rpart(u~pcares+antcare.skilled+riskintb+child.born+married+mage1+mage2, weights=popweight,method=alist) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Unpaste Problem
Hello, Easy ways to unpaste? xp - paste(x2, x3) # x2, x3 are two non-numeric columns. . . xfg - data.frame(xp,sc1, sc2, sc3) # sc1,sc2, sc3 are numeric cols. I want xp to be split up to form a new dataframe of the form (x3, sc1, sc2, sc3). IMPORTANT info : elements of xp have the form abcspaceefg, with abc in x2 and efg in x3. Thanks in advance, -- A. Mani Member, Cal. Math. Soc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] compare c-index of two logistic models using rcorrp.senc() of the Hmisc library
Dear R-help, Would it be appropriate to do the following to calculate a p-value for the difference between c-ind of x1 and c-inx of x2 using the output from rcorrp.senc() r-rcorrp.senc(x1,x1,y) pValue-1-pnorm((r[11]-r[12])/(r[2]/r[5])*1.96) Osman O. Al-Radi, MD, MSc, FRCSC Chief Resident, Cardiac Surgery University of Toronto, Canada __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] compare c-index of two logistic models using rcorrp.senc() of the Hmisc library
Osman Al-Radi wrote: Dear R-help, Would it be appropriate to do the following to calculate a p-value for the difference between c-ind of x1 and c-inx of x2 using the output from rcorrp.senc() r-rcorrp.senc(x1,x1,y) pValue-1-pnorm((r[11]-r[12])/(r[2]/r[5])*1.96) Osman O. Al-Radi, MD, MSc, FRCSC Chief Resident, Cardiac Surgery University of Toronto, Canada Osman, Because tests for differences in two ROC areas are not very powerful, rcorrp.cens changes the hypothesis to are predictions from one method more concordant with the outcome than predictions from the other method, within paired predictions. You can't get a difference in ROC areas from the U-statistic computed by rcorrp.cens. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] parts of data frames: subset vs. [-c()]
Are there NAs in the variable? SYNTAX==Ditrans and SYNTAX!=Ditrans are not mutually exclusive. On Fri, 26 Aug 2005, Stefan Th. Gries wrote: Dear all I have a problem with splitting up a data frame called ReVerb: » str(ReVerb) `data.frame': 92713 obs. of 16 variables: $ CHILD: Factor w/ 7 levels ABE,ADA,EVE,..: 1 1 1 1 1 1 1 1 1 1 ... $ AGE : Factor w/ 484 levels 1;06.00,1;06.16,..: 43 43 43 99 99 99 99 99 99 99 ... $ AGE_Q: num 2.0 2.0 2.0 2.4 2.4 ... $ INTERVALS: num 2 2 2 2.25 2.25 2.25 2.25 2.25 2.25 2.25 ... $ RND : int 34368 38311 14949 20586 72516 27186 88019 10767 114448 86146 ... $ SYNTAX : Factor w/ 17 levels Acmp,Amats,..: 15 12 8 15 7 16 7 7 16 7 ... $ LEXICAL : Factor w/ 1643 levels $ACHE,$ACT,..: 194 803 803 294 299 803 1562 299 679 1562 ... $ MORPH: Factor w/ 337 levels $,$ =inf,$ =prs,..: 9 20 9 39 184 231 57 67 231 39 ... $ COMPLEM : Factor w/ 1989 levels $,$ V PR=Lp [1.2],..: 203 547 220 203 1101 368 1834 1667 368 1834 ... $ MATRIX : Factor w/ 906 levels $ ???,$ be PR=Aen,..: 5 5 5 308 5 856 5 5 856 308 ... $ SITUATION: Factor w/ 9 levels [imitation of Mom: you know what I said],..: 2 2 2 2 2 2 2 2 2 2 ... $ V_ANN: int 1 1 1 4 4 4 4 3 3 3 ... $ QUEST: int 0 0 0 0 0 0 0 0 0 0 ... $ EXCL : int 0 0 0 1 1 1 1 0 0 0 ... $ U_LEN: int 3 4 5 13 13 13 13 8 8 8 ... $ UTTERANCE: Factor w/ 55113 levels ,# (be)cause he wanted to .,..: 5696 39091 52180 2262 2262 2262 2262 3593 3593 3593 ... The level causing the problem is SYNTAX: » as.data.frame(sort(table(SYNTAX))) sort(table(SYNTAX)) Particles 100 PR=N1 144 Amats 271 Trans_PR=A2 787 Ditrans 1181 Intrans_PR=A11399 Acmp 2402 Trans_PR=V2 2433 CPcmps 2769 Vpreps 4896 Intrans_V0 5182 Trans_PR=L2 7653 Trans_V028117 Intrans_PR=L18457 Intrans_V1 9643 Intrans_PR=V1 14987 Trans_V12 22288 I would like to extract all cases where SYNTAX==Ditrans from ReVerb, store that in a file, and then generate ReVerb again without these cases and factor levels. My problem is probably obvious from the following lines of code: » ditrans-which(SYNTAX==Ditrans) » ReVerb1-ReVerb[-c(ditrans),]; dim(ReVerb1) [1] 9153216 » » # ok, so the 92713-91532=1181 cases where SYNTAX==Ditrans have been removed, but ... » » ReVerb1-subset(ReVerb, SYNTAX!=Ditrans); dim(ReVerb1) [1] 9152816 » » # ... so why don't I get 91532 again as the number of rows? » Any ideas?? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] class 'named'
That is one of the S4 vs R differences. See the complements. On Fri, 26 Aug 2005, Phineas Campbell wrote: I'm working through the examples in Venables and Ripley in the 'New-style Classes' chapter. On a call to representation, in the lda example, it is unable to find the class named. Is the class named defined anywhere? I've loaded the library methods but this hasn't helped. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help in Compliling user -defined functions in Rpart
On Fri, 26 Aug 2005, Luwis Tapiwa Diya wrote: I have been trying to write my own user defined function in Rpart.I imitated the anova splitting rule which is given as an example.In the work I am doing ,I am calculating the concentration index(ci) ,which is in between -1 and +1.So my deviance is given by abs(ci)*(1-abs(ci)).Now when I run rpart incorporating this user defined function i get the following error message: Error in user.split(yback[1:nback], wback[1:nback], xback[1:nback], parms, : unused argument(s) ( ...) Now I am failing to indentify where I am going wrong (In case I am have made some mistake).So I was wondering if there is anybody who have written some user defined functions of theirs and maybe if there is any documentation with regards to user defined functions and examples. There is a commented example in the tests directory (of the sources). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] problem with certain data sets when using randomForest
Thank you for this and earlier help Mr. Ripley. Martin --- Prof Brian Ripley [EMAIL PROTECTED] wrote: Look at ?[.factor: finaldataset$Species - finaldataset$Species[,drop=TRUE] solves this. On Fri, 26 Aug 2005, Martin Lam wrote: Hi, Since I've had no replies on my previous post about my problem I am posting it again in the hope someone notice it. The problem is that the randomForest function doesn't take datasets which has instances only containing a subset of all the classes. So the dataset with instances that either belong to class a or b from the levels a, b and c doesn't work because there is no instance that has class c. Is there any way to solve this problem? library(randomForest) # load the iris plant data set dataset - iris numberarray - array(1:nrow(dataset), nrow(dataset), 1) # include only instances with Species = setosa or virginica indices - t(numberarray[(dataset$Species == setosa | dataset$Species == virginica) == TRUE]) finaldataset - dataset[indices,] # just to let you see the 3 classes levels(finaldataset$Species) # create the random forest randomForest(formula = Species ~ ., data = finaldataset, ntree = 5) # The error message I get Error in randomForest.default(m, y, ...) : Can't have empty classes in y. #The problem is that the finaldataset doesn't contain #any instances of versicolor, so I think the only way #to solve this problem is by changing the levels the #Species have to only setosa and virginica, # correct me if I'm wrong. # So I tried to change the levels but I got stuck: # get the possible unique classes uniqueItems - unique(levels(finaldataset$Species)) # the problem! newlevels - list(uniqueItems[1] = c(uniqueItems[1], uniqueItems[2]), uniqueItems[3] = uniqueItems[3]) # Error message Error: syntax error # In the help they use constant names to rename the #levels, so this works (but that's not what I want #because I don't want to change the code every time I #use another data set): newlevels - list(setosa = c(uniqueItems[1], uniqueItems[2]), virginica = uniqueItems[3]) levels(finaldataset$Species) - newlevels levels(finaldataset$Species) finaldataset$Species --- Thanks in advance, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Pseudo-Voigt fit
Hi, I am sorry for this question, but I am trying to speed up an application I will need to fit many x-y data sets (input from text files) to 4-parameter Pseudo-Voigt peak function. Until now I used SigmaPlot macro to do it (enclosed just in case...) peaksign(q) = if(total(q)q[1], 1, -1) xatymin(q,r) = xatymax(q,max(r)-r) [Parameters] a = if(peaksign(y)0, max(y), min(y)) ''Auto {{previous: 60.8286}} b = fwhm(x,abs(y))/2 ''Auto {{previous: 0.656637}} c = .5 ''Auto {{previous: 6.82973e-010}} x0 = if(peaksign(y)0, xatymax(x,y), xatymin(x,y)) ''Auto {{previous: 3.19308}} [Equation] f = a*(c*(1/(1+((x-x0)/b)^2))+(1-c)*exp(-0.5*((x-x0)/b)^2)) fit f to y (manageable for ~100), but it looks like the next project would need to process ~1000 member sets. I am not as familiar with R to find the right info (although I can use R in general). I am also nearly sure that there should be a solution to this task out there ready to be modified... Could you be so kind and direct me please to the right package or web-site with examples? Thank you very much Dr. Petr Pancoska Department of Pathology SUNY Stony Brook, NY 11794 phone: (631)-444-3030 ** This e- mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by e-mail and destroy all copies of the original. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unpaste Problem
On Fri, 2005-08-26 at 22:39 +0530, A Mani wrote: Hello, Easy ways to unpaste? xp - paste(x2, x3) # x2, x3 are two non-numeric columns. . . xfg - data.frame(xp,sc1, sc2, sc3) # sc1,sc2, sc3 are numeric cols. I want xp to be split up to form a new dataframe of the form (x3, sc1, sc2, sc3). IMPORTANT info : elements of xp have the form abcspaceefg, with abc in x2 and efg in x3. Thanks in advance, I think I understand what you are trying to do. Hopefully the below may be helpful: # Create the data frame with 3 rows x2 - letters[1:3] x3 - LETTERS[1:3] xp - paste(x2, x3) sc1 - rnorm(3) sc2 - rnorm(3) sc3 - rnorm(3) xfg - data.frame(xp, sc1, sc2, sc3) xfg xpsc1sc2sc3 1 a A 1.3479123 -1.0642578 0.2479218 2 b B -0.1586587 1.1237456 -1.3952176 3 c C 2.7807484 -0.9778066 -1.9322279 # Use strsplit() here to break apart 'xp' using as the split # Use [ in sapply() to get the second (2) element from each # 'xp' list pair. Note that I use as.character() here, since xfg$xp is # a factor. # See the output of: strsplit(as.character(xfg$xp), ) # for some insight into this approach xp.split - sapply(strsplit(as.character(xfg$xp), ), [, 2) # show post split values xp.split [1] A B C # Now cbind it all together into a new data frame # don't include column 1 from xfg (xp) xfg.new - cbind(xp.split, xfg[, -1]) xfg.new xp.splitsc1sc2sc3 1A 1.3479123 -1.0642578 0.2479218 2B -0.1586587 1.1237456 -1.3952176 3C 2.7807484 -0.9778066 -1.9322279 See ?strsplit for more information. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Creating factors from continuous variables
What is the quickest way to create many categorical variables (factors) from continuous variables? This is the approach that I have used: # create sample data N - 20 x - runif(N,0,1) # setup ranges to define categories x.a - (x = 0.0) (x 0.4) x.b - (x = 0.4) (x 0.5) x.c - (x = 0.5) (x 0.6) x.d - (x = 0.6) (x 1.0) # create factors i - runif(N,1,1) x.new - (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d) x.factor - factor(x.new) I'm looking for a better / simpler / more elegant / more robust (as the number of categories increases) way to do this. I also don't like that my factor names can only be numbers in this example. I would prefer a solution to take a form like the following (inspired by the hist function): # define breakpoints x.breaks = c(0, 0.4, 0.5, 0.6, 1.0) x.factornames = c( 0 - 0.4, 0.4 - 0.5, 0.5 - 0.6, 0.6 - 1.0 ) x.factor = unknown.function( x, x.breaks, x.factornames ) Thanks, David P.S. Here's what I have read to try to find the answer to my problem: * Introductory Statistics with R * A Brief Guide to R for Beginners in Econometrics * Econometrics in R __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Creating factors from continuous variables
?cut -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David James Sent: Friday, August 26, 2005 2:00 PM To: r-help@stat.math.ethz.ch Subject: [R] Creating factors from continuous variables What is the quickest way to create many categorical variables (factors) from continuous variables? This is the approach that I have used: # create sample data N - 20 x - runif(N,0,1) # setup ranges to define categories x.a - (x = 0.0) (x 0.4) x.b - (x = 0.4) (x 0.5) x.c - (x = 0.5) (x 0.6) x.d - (x = 0.6) (x 1.0) # create factors i - runif(N,1,1) x.new - (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d) x.factor - factor(x.new) I'm looking for a better / simpler / more elegant / more robust (as the number of categories increases) way to do this. I also don't like that my factor names can only be numbers in this example. I would prefer a solution to take a form like the following (inspired by the hist function): # define breakpoints x.breaks = c(0, 0.4, 0.5, 0.6, 1.0) x.factornames = c( 0 - 0.4, 0.4 - 0.5, 0.5 - 0.6, 0.6 - 1.0 ) x.factor = unknown.function( x, x.breaks, x.factornames ) Thanks, David P.S. Here's what I have read to try to find the answer to my problem: * Introductory Statistics with R * A Brief Guide to R for Beginners in Econometrics * Econometrics in R __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Creating factors from continuous variables
?cut This is in `An Introduction to R', the manual which ships with R and basic reading. On Fri, 26 Aug 2005, David James wrote: What is the quickest way to create many categorical variables (factors) from continuous variables? This is the approach that I have used: # create sample data N - 20 x - runif(N,0,1) # setup ranges to define categories x.a - (x = 0.0) (x 0.4) x.b - (x = 0.4) (x 0.5) x.c - (x = 0.5) (x 0.6) x.d - (x = 0.6) (x 1.0) # create factors i - runif(N,1,1) x.new - (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d) x.factor - factor(x.new) I'm looking for a better / simpler / more elegant / more robust (as the number of categories increases) way to do this. I also don't like that my factor names can only be numbers in this example. I would prefer a solution to take a form like the following (inspired by the hist function): # define breakpoints x.breaks = c(0, 0.4, 0.5, 0.6, 1.0) x.factornames = c( 0 - 0.4, 0.4 - 0.5, 0.5 - 0.6, 0.6 - 1.0 ) x.factor = unknown.function( x, x.breaks, x.factornames ) Thanks, David P.S. Here's what I have read to try to find the answer to my problem: * Introductory Statistics with R * A Brief Guide to R for Beginners in Econometrics * Econometrics in R -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unpaste Problem
On 8/26/05, A Mani [EMAIL PROTECTED] wrote: Hello, Easy ways to unpaste? xp - paste(x2, x3) # x2, x3 are two non-numeric columns. . . xfg - data.frame(xp,sc1, sc2, sc3) # sc1,sc2, sc3 are numeric cols. I want xp to be split up to form a new dataframe of the form (x3, sc1, sc2, sc3). IMPORTANT info : elements of xp have the form abcspaceefg, with abc in x2 and efg in x3. See https://www.stat.math.ethz.ch/pipermail/r-help/2005-August/076492.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] zoo, zooreg, ISOdatetime
I create a zooreg object that runs from Jan-1-2002 0:00 to Jun-1-2005 0:00... regts.start = ISOdatetime(2002, 1, 1, hour=0, min=0, sec=0, tz=) regts.end = ISOdatetime(2005, 6, 1, hour=0, min=0, sec=0, tz=) regts.zoo - zooreg( NA, regts.start, regts.end, deltat=3600 ) Upon inspection: regts.zoo[1:3] 2002-01-01 00:00:00 2002-01-01 01:00:00 2002-01-01 02:00:00 NA NA NA regts.zoo[29926:29928] 2005-05-31 22:00:00 2005-05-31 23:00:00 2005-06-01 00:00:00 NA NA NA However: summary(regts.zoo) Error in row.names-.data.frame(`*tmp*`, value = c(2002-01-01 00:00:00, : duplicate 'row.names' are not allowed I don't understand why it claims that there are duplicate row.names. Any advice? I probably could use the aggregate function to clean this up, but I don't see why it should be needed (provided that I do things properly in the first place). Thanks, David __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] PLSR: model notation and reliabilities
I'm new in both R and statistics. I did my homework, I tried the archives and whatever I managed to get from the sources, but still I need assistance with the plsr package. I have a model with 2 core determinants D1 and D2, made by 3 indicators each (D1a,D1b,D1c and so on). Also I have 2 moderating variables (m1,m2), where m1 moderates D1 and m2 modarates D2. The dependent variable (Y) is also constructed by 3 indicators (Y1,Y2,Y3). Actually my model is far more complicated, I just give a simplified example here. Which is the correct notation for the model (I'm skipping the crossvalidation for the moment) : MyModel - plsr(Y1+Y2+Y3 ~ ((D1a+D1b+D1c)*m1) + ((D2a+D2b+D2c)*m2),ncomp=2) or : Y - cbind(Y1,Y2,Y3) X1 - cbind(D1a,D1b,D1c) X2 - cbind(D2a,D2b,D2c) MyModel - plsr( Y ~ (X1*m1) + (X2*m2),ncomp=2) How do I calculate the internal composite reliabilty (ICR) ? Is the Average variable explained (AVE) the mentioned as % variance explained in summary ? I tried something like (the model is the first notation mentioned above, and the calcualtions below are simplified just for clarity) : ncomp=MyModel$ncomp P - MyModel$loadings[,ncomp] Q - MyModel$Yloadings[,ncomp] # D1 f1 - P[D1a] f2 - P[D1b] f3 - P[D1c] Sp - f1 + f2 + f3 Sp2 - (f1 ^ 2) + (f2^ 2) + (f3^2) Sth - (1-(f1 ^ 2)) + (1-(f2 ^ 2)) + (1-(f3^2)) D1_ICR - (Sp^2) / ( (Sp^2) + Sth) D1_AVE - Sp2 / ( Sp2 + Sth) but the results does not seem to give me something meaningfull. For example, while cronbach(cbind(D1a,D1b,D1c)) gives me 0.90, the above computed D1_ICR gives me very low numbers ( .20). Also summary says % variance explained for X = 83.1 in 1st component while my computed D1_AVE is unacceptable ( 10%). Where I made it wrong ? Or it is just my data ? Any help will be much appriciated TIA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] ARIMA (seasonal) backcasting interpolation
Thanks for everyone's help with zoo -- I think I've got my data set ready. (The data consists of surface weather temperatures, from 2002 to 2005, one observation per hour. Some values are missing... i.e. NA) I have three goals: GOAL #1:Get the data in proper time series form, preserving frequency information: w4.ts - as.ts( w3.zoo, frequency=(1/3600) ) I hope that 1/3600 (0.0002778) is correct. I chose it because my zooreg object reported that value. This goes back to my choice of the ISOdatetime format, which required deltat=3600. GOAL #2: Do an ARIMA analysis that takes into account seasonal variation a.1 - arima(w4.ts,order=c(1,0,0),seasonal=list(order=c (0,1,0),period=12)) First, I'm not quite sure if I should set period=12 (months in a year) or period=365*24 (number of my observations in a year). The documentation was unclear to me. Second, I've noticed that the fracdiff command is useful to find appropriate (p,d,q) values for ARIMA models. But I have not found a command that suggests reasonable values for the seasonal (p,d,q) values. GOAL #3 Use the ARIMA analysis to fill in for NA values. (I'm not sure how to do this yet. For example, I do not know if I will need to use windowing to smooth my backcasted data. I would appreciate any pointers, references, or code examples. Also, the terminology of backcasting and interpolation is not perfectly clear to me. I'm certainly looking to do more than linear interpolation between data points ... that's why I'm hoping that ARIMA will help. I need seasonal ARIMA, I believe, because there are seasonal swings in temperature Thanks, David [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to add values on the axes of the 3D bi-variable lrm fit?
I have not seen a reply to this, so I will offer a few comments. 1. I haven't used lrm, but I assume you are referring to the copy in the Design package; I surmised that from reviewing RSiteSearch(lrm). 2. I installed Design and learned that I also needed Hmisc, so I installed that also. The example you provided was not complete in itself, so that presented another obstacle to helping you. I therefore worked through examples with the documentation for lrm, from which I learned that your fit was of class Design. 3. The led me to the documentation for plot.Design. The examples there included one that produced a perspective plot. 4. I then read the documentation for persp and learned that it had an argument 'ticktype' with a nondefault value of detailed, which should produce what you want. 5. When I added 'ticktype=detailed' to the call to plot, I got an error message. I then started reading the code for plot.Design and learned that it had an argument 'perspArgs'. The documentation for plot.Design told me that was a list containing other named arguments to be passed to 'persp'. I tried that and it worked. 6. In the future, I believe you can increase your chances of getting a useful reply quickly if you PLEASE do read and follow the posting guide! http://www.R-project.org/posting-guide.html;. Best Wishes, spencer graves Jan Verbesselt wrote: Dear r-list, When I try to plot the following 3D lrm fit I obtain only arrows with labels on the three axes of the figure (without values). fit - lrm(y ~ rcs(x1,knots)+rcs(x2,knots), tol=1e-14,X=T,Y=T) dd - datadist(x1,x2);options(datadist='dd'); par(mfrow=c(1,1)) plot(fit,x1=NA, x2=NA, theta=50,phi=25) How can I add values to the axes of this plot? (axes with the range of values of each of the explanatory variables x1x2) Thanks, Jan ___ Ir. Jan Verbesselt Research Associate Group of Geomatics Engineering Department Biosystems ~ M³-BIORES Vital Decosterstraat 102, 3000 Leuven, Belgium Tel: +32-16-329750 Fax: +32-16-329760 http://gloveg.kuleuven.ac.be/ ___ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ARIMA (seasonal) backcasting interpolation
On 8/26/05, David James [EMAIL PROTECTED] wrote: Thanks for everyone's help with zoo -- I think I've got my data set ready. (The data consists of surface weather temperatures, from 2002 to 2005, one observation per hour. Some values are missing... i.e. NA) I have three goals: GOAL #1:Get the data in proper time series form, preserving frequency information: w4.ts - as.ts( w3.zoo, frequency=(1/3600) ) I hope that 1/3600 (0.0002778) is correct. I chose it because my zooreg object reported that value. This goes back to my choice of the ISOdatetime format, which required deltat=3600. I will just address the zoo portion of this question. In the ts class, a period is a unit so lets assume we want the resulting series to have a day represented as a unit. Then we first create a zoo series such that the integer part of the index is a day and then converting that is easy: regts.day - zoo(coredata(regts.zoo), as.numeric(time(regts.zoo))/(24*3600)) regts.ts - as.ts(regts.day) or convert it to chron first in which case its easy too: library(chron) regts.chron - zoo(coredata(regts.zoo), as.chron(time(regts.zoo))) regts.ts - as.ts(regts.chron) Of course had chron been used in the first place just the last line would be needed. Note that there are some issues as to which time zone is being used during conversion which I won't address but you can avoid that by just using chron right from the beginning. If you do use chron right from the beginning you won't have the problem with daylight savings time, you won't have the problem that the POSIXct and ts representations are very different and you won't have the problem of worrying about time zones. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] better than sapply
I have the following two mapping data frames (r) and (h). I want to fill teh value of r$seid with the value of r$seid where r$cid==h$cid. I can do it with sapply as such: r$seid = sapply(r$cid, function(cid) h[h$cid==cid,]$seid) Is ther a better (faster) way to do this? r - data.frame(seid=NA, cid= c(2181,2221,)) r seid cid 1 NA2181 2 NA2221 3 NA h - data.frame(seid= c(5598,5609,4931,5611,8123,8122), cid= c(2219,,2181,2190,2817,2221)) h cid seid 1 5598 2219 2 5609 3 4931 2181 4 5611 2190 5 8123 2817 6 8122 2221 to get the desired result of: r seid cid 1 4931 2181 2 8122 2221 3 5609 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] better than sapply
I don't know if its faster but you could try timing this to find out: r$seid - merge(h, r, by = cid)[,2] On 8/27/05, Omar Lakkis [EMAIL PROTECTED] wrote: I have the following two mapping data frames (r) and (h). I want to fill teh value of r$seid with the value of r$seid where r$cid==h$cid. I can do it with sapply as such: r$seid = sapply(r$cid, function(cid) h[h$cid==cid,]$seid) Is ther a better (faster) way to do this? r - data.frame(seid=NA, cid= c(2181,2221,)) r seid cid 1 NA2181 2 NA2221 3 NA h - data.frame(seid= c(5598,5609,4931,5611,8123,8122), cid= c(2219,,2181,2190,2817,2221)) h cid seid 1 5598 2219 2 5609 3 4931 2181 4 5611 2190 5 8123 2817 6 8122 2221 to get the desired result of: r seid cid 1 4931 2181 2 8122 2221 3 5609 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html