Re: [R] (no subject)
type ?par and then have a look at: cex.lab, cex.main cheers christoph [EMAIL PROTECTED] wrote: Dear ladies and gentlemen! When I use the plot funtion how can I change the size of the title for the x and y axes (xlab, ylab)and the size of the axes label ? Thank you very much. With best regards Claudia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Getting eps into Word documents.
as far as I know (and I am not a specialist), there is no way of importing EPS in word, nor in openoffice, if you want to be able to see the graphics in the word processor and not only a placeholder. So either a) use formats such as bmp, png, or b) under windows: convert eps to e.g. wmf using a tool, such as corel draw, or import the eps into powerpoint, then save it as wmf c) try to import the eps in inkscape (I think this opensource tool is available also for windows), but I have never tried this .. I have to admit, that under linux I have found no way so far, to import an eps graphics into a word document (neither ooo writer, nor msword, running with wine). Christoph tom wright wrote: On Mon, 2005-03-10 at 16:31 -0300, Rolf Turner wrote: A student in one of my courses has asked me about getting R graphics output (under Linux) into a Word document. I.e. she wants to do her R thing under Linux, but then do her word processing using Word. Scanning around the r-help archives I encountered an inquiry about this topic --- eps into Word documents --- from Paul Johnson but found no replies to it. I tried contacting him but the email address in the archives appeared not to be valid. Does anyone know a satisfactory solution to the problem of including a graphic which exists in the form of a *.eps (encapsulated postscript) file into a Word document. If so, would you be willing to share it with me and my student? If so, please be gentle in your explanation. I am not myself (repeat ***NOT***) a user of Word! Thanks. cheers, Rolf Turner [EMAIL PROTECTED] R can also create more generic image formats such as png, i just use these when I'm forced to insert graphics for presentations ?png __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Creating an array [0 to 101] of a list
a - array(vector(list, 3), dim=c(3)) a[[1]] - list(x = 1, y = 0, z = -1) a[[2]] - list(x = 0, y = 1, z = -1) a[[3]] - list(x = 0, y = -1, z = 0) HTH christoph Rainer M. Krug wrote: Hi I looked, but I didn't find it: I need an array [0 to 101] where each element is a list (the result of Kest in spatstat). I.e. I want to do: A[2] - Kest(pp1) A[3] - Kest(pp2) ... A[101] - Kest(pp100) How can I create A ? Rainer __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] plot.augPred sorted and labelled according second factor
Hi using this code example: library(nlme) fm1 - lme(Orthodont, random = ~1) plot(augPred(fm1)) is there any way to have the plots in each cell labelled and ordered according to Orthodont$Sex? I.e. in addition to the bar with the label for Orthodont$Subject there is another bar labelling the Sex of the subject? thanks a lot christoph -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] savePlot(type=wmf) in loop fails, since too fast
hi working with R-2.1.1 on winxp, in a loop I draw to a trellis.device which takes some time. After the drawing I call savePlot(). it seems, the loop is too fast for the savePlot() call to finish. Is there any solution for such a problem? Calling the same steps outside the loop, works fine. many thanks christoph -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lme model: Error in MEEM
Hi, We have data of two groups of subjects: 32 elderly, 14 young adults. for each subject we have 15 observations, each observation consisting of a reaction-time measure (RT) and an activation maesure (betadlpcv). since we want to analyze the influence of (age-)group and RT on the activation, we call: lme(betadlpcv ~ RT*group, data=our.data, random=~ RT |subject) this yields: Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1 In addition: Warning message: Fewer observations than random effects in all level 1 groups in: lme.formula(betadlpcv ~ RT * group, data = patrizia.data, random = ~RT | what's the problem here? thanks for your kind help christoph -- Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko! Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lme model: Error in MEEM
sorry, RT had an error in raw data and was treated as a factor. after correction of the raw data (RT is numeric) now it works fine. thanks a lot christoph --- Ursprüngliche Nachricht --- Von: Douglas Bates [EMAIL PROTECTED] An: Christoph Lehmann [EMAIL PROTECTED] Kopie: r-help@stat.math.ethz.ch Betreff: Re: [R] lme model: Error in MEEM Datum: Thu, 18 Aug 2005 08:29:52 -0500 On 8/18/05, Christoph Lehmann [EMAIL PROTECTED] wrote: Hi, We have data of two groups of subjects: 32 elderly, 14 young adults. for each subject we have 15 observations, each observation consisting of a reaction-time measure (RT) and an activation maesure (betadlpcv). since we want to analyze the influence of (age-)group and RT on the activation, we call: lme(betadlpcv ~ RT*group, data=our.data, random=~ RT |subject) this yields: Error in MEEM(object, conLin, control$niterEM) : Singularity in backsolve at level 0, block 1 In addition: Warning message: Fewer observations than random effects in all level 1 groups in: lme.formula(betadlpcv ~ RT * group, data = patrizia.data, random = ~RT | what's the problem here? It seems that you only have one observation per subject and you are trying to estimate a model with two random effects per subject plus the per-observation noise term. These terms are completely confounded. thanks for your kind help christoph -- Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko! Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nonparametric 2way repeated-measures anova
Dear useRs is there any nonparametric test for the analysis of variance in a design with two within-factors (repeated measures on both factors)? Friedman is not appropriate here, therefore I am grateful for any alternative test. thanks for any hint cheers christoph -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] studentized CIs for a correlation using package boot
Dear useRs I need to compute studentized confidence intervals for a correlation, using the boot library. For this CIs we need to compute a variance estimate of the statistic (here correlation coeff) from each boostrap sample. There are 2 important points, I think: (1) We need to do a fisher transformation (atanh(x)) to correct for non-normality, this can be done easily be specifying h, hinv, and hdot parameteres in the boot.ci call. (2) an estimate for the variance is (as far as I remember) 1-correlation2)2/n (For fisher transformed data, an estimator is: 1/(n-3)) do you think, this is the correct way: library(boot) fisher - function(r) 0.5*log((1+r)/(1-r)) fisher.dot - function(r) 1/(1-r2) fisher.inv - function(z) (exp(2*z)-1)/(exp(2*z)+1) boot.fun - function(data, i) { n - length(i) correlation - cor(data[i,1],data[i,2]) v - (1-correlation2)2/n c(correlation, v) } td.boot - boot(td, boot.fun, R=) boot.ci(td.boot, h = fisher, hdot = fisher.dot, hinv = fisher.inv, conf = c(0.95)) ?. many thanks for your thoughts cheers christoph -- Weitersagen: GMX DSL-Flatrates mit Tempo-Garantie! Ab 4,99 Euro/Monat: http://www.gmx.net/de/go/dsl __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Change class factor to numeric
?as.factor states: To revert a factor 'f' to its original numeric values, 'as.numeric(levels(f))[f]' is recommended and slightly more efficient than 'as.numeric(as.character(f)) christoph Petr Pikal wrote: Hi On 6 May 2005 at 8:51, Paulo Justiniano Ribeiro Jr wrote: as.numeric() Not exactly correct. as.numeric(as.character()) gives you what you probably want ,if mass is really factor ;) see str(mass) Cheers Petr On Fri, 6 May 2005, Smit, R. (Robin) (IenT) wrote: I am attempting to develop a multiple regression model using selected model variables that should all be treated as numeric (mostly real) values. However, R considers one specific variable mass automatically to be of class factor, probably because mass consists of integer values that are repeated. I now want to force R to treat mass as a numeric variable in the regression but am not sure how to do this. class(mass) - numeric does not help me. Could anyone advise me on this? Kind regards, Robin Smit TNO Science Technology Business Unit Automotive Environmental Studies Testing PO Box 6033, 2600 JA Delft THE NETHERLANDS ph. +31 (0)15 269 7464 fax +31 (0)15 269 6874 [EMAIL PROTECTED] http://www.tno.nl/industrie_en_techniek/mobiliteit_en_(transport)/mi lieu studies/environmental_studies_and/ http://www.tno.nl/industrie_en_techniek/mobiliteit_en_(transport)/m ilie ustudies/environmental_studies_and/ This e-mail and its contents are subject to the DISCLAIMER at http://www.tno.nl/disclaimer/email.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Paulo Justiniano Ribeiro Jr LEG (Laboratrio de Estatstica e Geoinformao) Departamento de Estatstica Universidade Federal do Paran Caixa Postal 19.081 CEP 81.531-990 Curitiba, PR - Brasil Tel: (+55) 41 361 3573 Fax: (+55) 41 361 3141 e-mail: [EMAIL PROTECTED] http://www.est.ufpr.br/~paulojus __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help with R
I heard that 'R' does not do a very good job at handling large datasets, is this true? importing huge datasets in a data.frame with e.g. a subsequent step of conversion of some columns into factors may lead into memory troubles (probably due to memory overhead when building out factors). But we currently succeeded in importing 12 millions of data records stored in a MySQL database, using RMySQL package. The procedure which lead to success was: 0 define a data.frame 'data.total' with the size necessary to keep the whole data set to be imported in a loop do: 1 import the data in chunks of eg 3 records per chunk and save it in a temporary data.frame 'data.chunk' 2 the conversion into factors and other preprocessing steps, such as data aggregation should be done for each single chunk saved in 'data.chunk' after import 3 the now preprocessed chunk is saved into the appropriate part of the at the beginning defined data.frame 'data.total' 4 whole dataset is imported and data.frame 'data.total' is ready for further computational steps in a nutshell: preprocessing steps such as conversion into factors yield memory troubles, even for data.sets which per se don't take too much memory- but done separately in smaller chunks of data, it can be done with R very efficiently. The 'team' MySQL together with R is VERY powerful Cheers Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] selections of data by one variable
test - data.frame(cbind(1:10,11:20)) names(test) - c(a, b) test[test$b == 17,] test[test$b %in% c(13, 15, 17),] Tu Yu-Kang wrote: Dear R experts, My problem is as follows: Suppose I have a data frame d comprising two variable a-c(1:10) b-c(11:20). I now want to select a subgroup according the values of b. I know if I just want to select, say, b=17, I can use f-d[d$b==17] and R will give me f a b 7 7 17 However, if now I want to select a subgroup according to b==e-c(13,15,17), then the same syntx doesn't work. What is the correct way to do it? My data have more than one million subjects, and I want to select part of them according to their id numbers. Your help will be highly appreciated. Best regards, Yu-Kang __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RMySQL installation: libz missing
Hi I run suse linux 9.1 and I installed MySQL server, client, devel, bench. DBI is installed, when I try to install RMySQL I get an error saying, that libz is missing. (paths to libs were set:export PKG_CPPFLAGS=-I/usr/include/mysql/ export PKG_LIBS=-L/usr/lib/mysql/ -lmysqlclient) so my question: where do I get the libz files (are these mysql files? if yes, why were they not installed at least by mysql-devel?) thanks for your kind help christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] RMySQL installation: libz missing SOLVED
it seemed to be a problem with the rpm for suse 9.1.. I installed and compiled R2.1 using the sources, then installation of RMySQL succeeded mmmh .. Christoph Christoph Lehmann wrote: Hi I run suse linux 9.1 and I installed MySQL server, client, devel, bench. DBI is installed, when I try to install RMySQL I get an error saying, that libz is missing. (paths to libs were set:export PKG_CPPFLAGS=-I/usr/include/mysql/ export PKG_LIBS=-L/usr/lib/mysql/ -lmysqlclient) so my question: where do I get the libz files (are these mysql files? if yes, why were they not installed at least by mysql-devel?) thanks for your kind help christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] RMySQL query: why result takes so much memory in R ?
Hi I just started with RMySQL. I have a database with roughly 12 millions rows/records and 8 columns/fields. From all 12 millions of records I want to import 3 fields only. The fields are specified as:id int(11), group char(15), measurement float(4,2). Why does this take 1G RAM? I run R on suse linux, with 1G RAM and with the code below it even fills the whole 1G of swap. I just don't understand how 12e6 * 3 can fill such a huge range of RAM? Thanks for clarification and potential solutions. ## my code library(RMySQL) drv - dbDriver(MySQL) ch - dbConnect(drv,dbname=testdb, user=root,password=mysql) testdb - dbGetQuery(ch, select id, group, measurement from mydata) dbDisconnect(ch) dbUnloadDriver(drv) ## end of my code Cheers Christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] summary(as.factor(x) - force to not sort the result according factor levels
Hi The result of a summary(as.factor(x)) (see example below) call is sorted according to the factor level. How can I get the result not sorted but in the original order of the levels in x? test - c(120402, 120402, 120402, 1323, 1323,200393, 200393, 200393, 200393, 200393) summary(as.factor(test)) 1323 120402 200393 2 3 5 I need: 120402 1323 200393 32 5 thanks for a hint christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] large dataset import, aggregation and reshape
Dear useRs We have a data-set (comma delimited) with 12Millions of rows, and 5 columns (in fact many more, but we need only 4 of them): id, factor 'a' (5 levels), factor 'b' (15 levels), date-stamp, numeric measurement. We run R on suse-linux 9.1 with 2GB RAM, (and a 3.5GB swap file). on average we have 30 obs. per id. We want to aggregate (eg. sum of the measuresments under each factor-level of 'a' and the same for factor 'b') and reshape the data so that for each id we have only one row in the final data.frame, means finally we have roughly 40 lines. I tried read.delim, used the nrows argument, defined colClasses (with an as.Date class) - memory problems at the latests when calling reshape and aggregate. Also importing the date column as character and then converting the dates column using 'as.Date' didn't succeed. It seems the problematic, memory intesive parts are: a) importing the huge data per se (but the data with dim c(12,5) 2GB?) b) converting the time-stamp to a 'Date' class c) aggregate and reshape task What are the steps you would recommend? (i) using scan, instead of read.delim (with or without colClasses?) (ii) importing blocks of data (eg 1Million lines once), aggregating them, importing the next block, so on? (iii) putting the data into a MySQL database, importing from there and doing the reshape and aggregation in R for both factors separately thanks for hints from your valuable experience cheers christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bootstrap / permutation textbooks
look at: AC Davison, DV Hinkley: Bootstrap Methods and Their Applications there is also a R-library 'boot', based on methods reported in this book C Peter Soros wrote: Dear R experts, I would like to explore if and to what extent bootstrapping and permutation statistics can help me for my research (functional brain imaging). I am looking for an introductory textbook, rather legible. I have statistical knowledge, but I am definitely no statistical or mathematical guru. Do you have suggestions for a useful textbook? Thanks a lot, Peter __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lsfit result - how to compute t-values for coefficients
Hi I used lsfit instead of lm since I have a huge Y data-set (X being constant for all Y). Since I require the t-values for all coefficients: which would be the fastest way to compute them, eg for the example: ## using lsfit with a matrix response: t.length - 5 d.dim - c(t.length,7,8,9) # dimesions: time, x, y, z Y - array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1), d.dim) X - cbind(c(1,3,2,4,5), c(1,1,1,5,5)) date() rsq -lsfit(X, array(c(Y), dim = c(t.length, prod(d.dim[2:4]$coef[2,] #coef for first non-const pred names(rsq) - prod(d.dim[2:4]) rsq - array(rsq, dim = d.dim[2:4]) date() what would be the best way to get the t-value for all coef, not only (as above illustrated for the beta value) for one predefined coef? ##- many thanks christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] apply vs sapply vs loop - lm() call appl(y)ied on array
Dear useRs (Code of the now mentioned small example is below) I have 7 * 8 * 9 = 504 series of data (each length 5). For each of theses series I want to compute a lm(), where the designmatrx X is the same for all these computations. The 504 series are in an array of dimension d.dim - c(5, 7, 8, 9) means, the first dimension holds the data-series. The lm computation needs performance optimization, since in fact the dimensions are much larger. I compared the following approaches: using a for-loop. using apply, and using sapply. All of these require roughly the same time of computation. I was astonished since I expected at least sapply to outperfomr the for-loop. Do you have me another solution, which is faster? many thanks here is the code ## -- t.length - 5 d.dim - c(t.length,7,8,9) # dimesions: time, x, y, z Y - array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1), d.dim) X - c(1,3,2,4,5) ## performance tests ## using for loop date() z - rep(0, prod(d.dim[2:4])) l - 0 for (i in 1:dim(Y)[4]) for (j in 1:dim(Y)[3]) for (k in 1:dim(Y)[2]) { l - l + 1 z[l] - unlist(summary(lm(Y[,k, j, i] ~ X)))$r.squared } date() ## using apply date() z - apply(Y, 2:4, function(x) unlist(summary(lm(x ~ X)))$r.squared) date() ## using sapply date() fac - rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4]))) z - sapply(split(as.vector(Y), fac), FUN = function(x) unlist(summary(lm(x ~ X)))$r.squared) dim(z) - d.dim[2:4] date() ## -- -- Christoph LehmannPhone: ++41 31 930 93 83 Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00 University Hospital of Clinical Psychiatry Fax:++41 31 930 99 61 Waldau[EMAIL PROTECTED] CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_04.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lda (MASS)
Now, I use my real dataset (900 instances, 21 attributes), which 2 classes can be serparated with accuracy no more than 80% (10xval) with KNN, SVM, C4.5 and the like. I thinks these accuracies are based on cross-validation runs. Whereas the 80% accuracy you report using LDA is not based on cross-validation runs as long as CV is not set to TRUE. PS: and does anybody know how to use the CV option of lda to make xval? I can't get it. z - lda(Sp ~ ., Iris, CV = TRUE) table(Iris$Sp, z$class) cheers christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] apply vs sapply vs loop - lm() call appl(y)ied on array
Ok thanks to a hint of Matthew to a former post with a similar request I have now three faster solutions (see below), the last one being the fastest, but the former two also faster than the for-loop, apply(lm(formula)) and sapply(lm(formula)) versions in my last mail: one problem only: using lsfit I can't get directly measures such as r.squared ... --- ## using lm with a matrix response (recommended by BDR) date() rsq -unlist(summary(lm(array(c(Y), dim = c(t.length, prod(d.dim[2:4]))) ~ X)))[seq(22, prod(d.dim[2:4]) * 30, by = 30)] #get r.squared list-element names(rsq) - prod(d.dim[2:4]) rsq - array(rsq, dim = d.dim[2:4]) date() ## using sapply and lsfit instead of lm (recommended by Kevin Wright) date() fac - rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4]))) z - sapply(split(as.vector(Y), fac), FUN = function(x) lsfit(X, x)$coef[2]) dim(z) - d.dim[2:4] date() ## using lsfit with a matrix response: date() rsq -lsfit(X, array(c(Y), dim = c(t.length, prod(d.dim[2:4]$coef[2,] names(rsq) - prod(d.dim[2:4]) rsq - array(rsq, dim = d.dim[2:4]) date() -- thanks Christoph Wiener, Matthew wrote: Christoph -- There was just a thread on this earlier this week. You can search in the archives for the title: refitting lm() with same x, different y. (Actually, it doesn't turn up in the R site search yet, at least for me. But if you just go to the archive of recent messages, available through CRAN, you can search on refitting and find it. The original post was from William Valdar, on April 19.) Hope this helps, Matt Wiener -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christoph Lehmann Sent: Thursday, April 21, 2005 9:24 AM To: R-help@stat.math.ethz.ch Subject: [R] apply vs sapply vs loop - lm() call appl(y)ied on array Dear useRs (Code of the now mentioned small example is below) I have 7 * 8 * 9 = 504 series of data (each length 5). For each of theses series I want to compute a lm(), where the designmatrx X is the same for all these computations. The 504 series are in an array of dimension d.dim - c(5, 7, 8, 9) means, the first dimension holds the data-series. The lm computation needs performance optimization, since in fact the dimensions are much larger. I compared the following approaches: using a for-loop. using apply, and using sapply. All of these require roughly the same time of computation. I was astonished since I expected at least sapply to outperfomr the for-loop. Do you have me another solution, which is faster? many thanks here is the code ## -- t.length - 5 d.dim - c(t.length,7,8,9) # dimesions: time, x, y, z Y - array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1), d.dim) X - c(1,3,2,4,5) ## performance tests ## using for loop date() z - rep(0, prod(d.dim[2:4])) l - 0 for (i in 1:dim(Y)[4]) for (j in 1:dim(Y)[3]) for (k in 1:dim(Y)[2]) { l - l + 1 z[l] - unlist(summary(lm(Y[,k, j, i] ~ X)))$r.squared } date() ## using apply date() z - apply(Y, 2:4, function(x) unlist(summary(lm(x ~ X)))$r.squared) date() ## using sapply date() fac - rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4]))) z - sapply(split(as.vector(Y), fac), FUN = function(x) unlist(summary(lm(x ~ X)))$r.squared) dim(z) - d.dim[2:4] date() ## -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] basic question
sapply(split(test, test$year), function(x) list(x.s = sum(x$x), y.s = sum(x$y), z.s = sum(x$z))) or for one variable only aggregate(test$x, list(id = test$year), sum) cheers christoph jose silva wrote: I know this question is very simple, but I am not figure it out I have the data frame: test- data.frame(year=c(2000,2000,2001,2001),x=c(54,41,90,15), y=c(29,2,92,22), z=c(26,68,46,51)) test yearx y z 1 2000 54 29 26 2 2000 41 2 68 3 2001 90 92 46 4 2001 15 22 51 I want to sum the vectors x, y and z within each year (2000 and 2001) to obtain this: year x yz 1 2000 95 31 94 2 2001 105 114 97 I tried tapply but did not work (or probably I do it wrong) Any suggestions? silva [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] indexing an array using an index-array, but one entry being ', '
Hi I have the following array: test - array(c(1:16), dim = c(3,4,3)) test ## I call some enries using an index array test.ind - array(rbind(c(1,2,1), c(3,3,2)), dim = c(2,3)) test[test.ind] ## suppose I want all values in the 2nd row and 4th col over ## all three 3rd dimensions test[2,4,] how to specify a test.ind array with the last index left with ',' i.e test.ind should be evaluated as 2, 4, , so that it can be calledlike above as test[test.ind] and the result should be [1] 11 7 3 thanks for a hint Cheers christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] indexing an array using an index-array, but one entry being ', '
OK, the hint by Dimitris applied I just do very simple: test - array(c(1:16), dim = c(3,4,3)) test ## I call some enries using an index array test.ind - array(rbind(c(1,2,1), c(3,3,2)), dim = c(2,3)) test[test.ind] ## suppose I want all values in the 2nd row and 4th col over ## all three 3rd dimensions test[2,4,] ## using an index array nn - dim(test)[3] voxel.ind - c(2, 4) test.ind - array(cbind(rep(voxel.ind[1], nn), rep(voxel.ind[2], nn), 1:nn), dim = c(nn, 3)) test[test.ind] cheers christoph Christoph Lehmann wrote: Hi I have the following array: test - array(c(1:16), dim = c(3,4,3)) test ## I call some enries using an index array test.ind - array(rbind(c(1,2,1), c(3,3,2)), dim = c(2,3)) test[test.ind] ## suppose I want all values in the 2nd row and 4th col over ## all three 3rd dimensions test[2,4,] how to specify a test.ind array with the last index left with ',' i.e test.ind should be evaluated as 2, 4, , so that it can be calledlike above as test[test.ind] and the result should be [1] 11 7 3 thanks for a hint Cheers christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] select cases
subset(your.data.frame, your.data.frame$sex == 'male') cheers c Faouzi LYAZRHI wrote: Hi, I would like to select a few cases (for example cases corresponding to sex=male) to make summary for another variable. How can I do this. Thanks for your help Fawtzy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Package 'R2HTML'
try this litte example: save the code below in a file test.Rnw and call then Sweave(test.Rnw, driver = RweaveHTML()) -- html body h1testing r2html/h1 plook at this: here you can write some text/p font color=darkredbSexpr format(Sys.time(),%Y)/b/font. echo=FALSE= summary(data.frame(c(1,2,3), c(3,4,5))) @ pinsert some graphics/p echo=FALSE,fig=TRUE,border=1,width=900,height=500,HTMLwidth=900,HTMLheight=500= print(plot(c(1:30), c(1:30))) @ /body /html -- hope it helps let me know christoph Singh, Avneet wrote: I recently learnt how to use Sweave which is a wonderful tool After which i also tried to use R2HTML as it would allow many of my colleagues who dont use latex to be able to use and edit my work. I was unable to make it work and couldnt find a way to implement it. I got some errors. I wonder if you could help me with it i have a windows 2000 OS and the version of R is R version 1.9.1, 2004-06-21 This is the command i gave followed by the output, i couldnt find much info on SweaveParseOptions Sweave(Sweave-test-1.Rnw,driver=RweaveHTML) Writing to file Sweave-test-1.html Processing code chunks ... Error: couldn't find function SweaveParseOptions I have no data yet. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories instead of theories to suit facts. ~ Sir Arthur Conan Doyle (1859-1930), Sherlock Holmes __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] colClasses = Date in read.delim, how to pass date-format?
Hi I have a huge data-set with one column being of type date. Of course I can import the data using this column as factor and then convert it later to dates, using: sws.bezuege$FaktDat - dates(as.character(sws.bezuege$FaktDat), format = c(dates = d.m.y)) But the conversion requires a huge amount of memory (and time), therefore I would like to use colClasses = c(Date). My question is: since I have format = c(dates = d.m.y), how can I pass this option to read.delim(..., colClasses = c(Date)) ? thanks for a hint cheers christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] colClasses = Date in read.delim, how to pass date-format?
so what do you recommend: I just need to be able to sort a data.frame according to the date entry, and e.g. compute differences between subsequent dates. Shall I stay with dates (thanks for the hint about confusion of Date and dates) or is there a better way for this kind of task? thanks a lot Cheers Christoph You are confusing class Date (part of R) with class dates (part of package chron). There is no as() method for class dates, so you can't do this. You can read the column as character (not factor) and convert later, but it sounds like the `huge amount of memory (and time)' is in fact taken by package chron. On Mon, 18 Apr 2005, Christoph Lehmann wrote: I have a huge data-set with one column being of type date. Of course I can import the data using this column as factor and then convert it later to dates, using: sws.bezuege$FaktDat - dates(as.character(sws.bezuege$FaktDat), format = c(dates = d.m.y)) But the conversion requires a huge amount of memory (and time), therefore I would like to use colClasses = c(Date). My question is: since I have format = c(dates = d.m.y), how can I pass this option to read.delim(..., colClasses = c(Date)) ? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] read.delim: only first column import
Hi if I use read.delim, I can specify how many lines I want to import. Is there also a way to specify that, e.g. I want only the first column field of each line to have imported? thanks for a hint cheers christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregation question
Hi I have a question concerning aggregation (simple demo code S. below) I have the data.frame idmeas date 1 a 0.6375137471 2 a 0.1877100632 3 a 0.2470984592 4 a 0.3064476903 5 b 0.4075735772 6 b 0.7832550852 7 b 0.3442650823 8 b 0.1038930683 9 c 0.7386495861 10 c 0.6141540372 11 c 0.9499243713 12 c 0.0081878584 When I want for each id the sum of its meas I do: aggregate(data$meas, list(id = data$id), sum) If I want to know the number of meas(ures) for each id I do, eg aggregate(data$meas, list(id = data$id), length) NOW: Is there a way to compute the number of meas(ures) for each id with not identical date (e.g using diff()? so that I get eg: id x 1 a 3 2 b 2 3 c 4 I am sure it must be possible thanks for any (even short) hint cheers Christoph -- data - data.frame(c(rep(a, 4), rep(b, 4), rep(c, 4)), runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 1, 2, 3, 4)) names(data) - c(id, meas, date) m - aggregate(data$meas, list(id = data$id), sum) names(m) - c(id, cum.meas) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aggregation question
Dear Sundar, dear Andy manyt thanks for the length(unique(x)) hint. It solves of course my problem in a very elegant way. Just of curiosity (or for potential future problems): how could I solve it in a way, conceptually different, namely, that the computation on 'meas' being dependent on the variable 'date'?, means the computation on a variable x in the function passed to aggregate is conditional on the value of another variable y? I hope you understand what I mean, let's think of an example: E.g for the example data.frame below, the sum shall be taken over the variable meas only for all entries with a corresponding 'data' != 2 for this do I have to nest two aggregate statements, or is there a way using sapply or similar apply-based commands? thanks a lot for your kind help. Cheers! Christoph aggregate(data$meas, list(id = data$id), sum) Christoph Lehmann wrote on 4/15/2005 9:51 AM: Hi I have a question concerning aggregation (simple demo code S. below) I have the data.frame idmeas date 1 a 0.6375137471 2 a 0.1877100632 3 a 0.2470984592 4 a 0.3064476903 5 b 0.4075735772 6 b 0.7832550852 7 b 0.3442650823 8 b 0.1038930683 9 c 0.7386495861 10 c 0.6141540372 11 c 0.9499243713 12 c 0.0081878584 When I want for each id the sum of its meas I do: aggregate(data$meas, list(id = data$id), sum) If I want to know the number of meas(ures) for each id I do, eg aggregate(data$meas, list(id = data$id), length) NOW: Is there a way to compute the number of meas(ures) for each id with not identical date (e.g using diff()? so that I get eg: id x 1 a 3 2 b 2 3 c 4 I am sure it must be possible thanks for any (even short) hint cheers Christoph -- data - data.frame(c(rep(a, 4), rep(b, 4), rep(c, 4)), runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 1, 2, 3, 4)) names(data) - c(id, meas, date) m - aggregate(data$meas, list(id = data$id), sum) names(m) - c(id, cum.meas) How about: m - aggregate(data[date], data[id], function(x) length(unique(x))) --sundar -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] aggregation question
great, Andy! Thanks a lot- I didn't know split. So 'split' can be used as alternative for 'aggregate', with the advantage that in the passed self-defined function one can consider more than one variable of the to-be-aggregated data.frame? Christoph If I understood you correctly, here's one way: sumWO2 - sapply(split(dat, dat$id), function(d) sum(d$meas[d$date != 2])) sumWO2 a b c 0.9439614 0.4481582 1.6967618 Andy From: Christoph Lehmann Dear Sundar, dear Andy manyt thanks for the length(unique(x)) hint. It solves of course my problem in a very elegant way. Just of curiosity (or for potential future problems): how could I solve it in a way, conceptually different, namely, that the computation on 'meas' being dependent on the variable 'date'?, means the computation on a variable x in the function passed to aggregate is conditional on the value of another variable y? I hope you understand what I mean, let's think of an example: E.g for the example data.frame below, the sum shall be taken over the variable meas only for all entries with a corresponding 'data' != 2 for this do I have to nest two aggregate statements, or is there a way using sapply or similar apply-based commands? thanks a lot for your kind help. Cheers! Christoph aggregate(data$meas, list(id = data$id), sum) Christoph Lehmann wrote on 4/15/2005 9:51 AM: Hi I have a question concerning aggregation (simple demo code S. below) I have the data.frame idmeas date 1 a 0.6375137471 2 a 0.1877100632 3 a 0.2470984592 4 a 0.3064476903 5 b 0.4075735772 6 b 0.7832550852 7 b 0.3442650823 8 b 0.1038930683 9 c 0.7386495861 10 c 0.6141540372 11 c 0.9499243713 12 c 0.0081878584 When I want for each id the sum of its meas I do: aggregate(data$meas, list(id = data$id), sum) If I want to know the number of meas(ures) for each id I do, eg aggregate(data$meas, list(id = data$id), length) NOW: Is there a way to compute the number of meas(ures) for each id with not identical date (e.g using diff()? so that I get eg: id x 1 a 3 2 b 2 3 c 4 I am sure it must be possible thanks for any (even short) hint cheers Christoph -- data - data.frame(c(rep(a, 4), rep(b, 4), rep(c, 4)), runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 1, 2, 3, 4)) names(data) - c(id, meas, date) m - aggregate(data$meas, list(id = data$id), sum) names(m) - c(id, cum.meas) How about: m - aggregate(data[date], data[id], function(x) length(unique(x))) --sundar -- +++ GMX - Die erste Adresse für Mail, Message, More +++ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- GMX Garantie: Surfen ohne Tempo-Limit! http://www.gmx.net/de/go/dsl __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aggregate slow with variables of type 'dates' - how to solve
Dear all I use aggregate with variables of type numeric and dates. For type numeric functions, such as sum() are very fast, but similar simple functions, such as min() are much slower for the variables of type 'dates'. The difference gets bigger the larger the 'id' var is - but see this sample code: dts - dates(c(02/27/92, 02/27/92, 01/14/92, 02/28/92, 02/01/92)) ntimes - 70 dts - data.frame(rep(c(1:40), ntimes/8), chron(rep(dts, ntimes), format = c(dates = m/d/y)), rep(c(0.123, 0.245, 0.423, 0.634, 0.256), ntimes)) names(dts) - c(id, date, tbs) date() dat.1st - aggregate(dts$date, list(id = dts$id), min)$x dat.1st - chron(dat.1st, format = c(dates = m/d/y)) dat.1st date() #82 seconds date() tbs.s - aggregate(as.numeric(dts$tbs),list(id = dts$id), sum) tbs.s date() #17 seconds --- is it a problem of data-type 'dates' ? if yes, is there any solution to solve this, since for huge data-sets, this can be a problem... as I mentioned, e.g. if we have for variable 'id' eg just 5 levels, the two times are roughly the same, but with the 40 different ids, we have this big difference thanks a lot Christoph -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] sweave bwplot error
Hi I use sweave and have a problem with the following figure, but not with other figures: tt - data.frame(c(a, b, c), c(1.2, 3, 4.5)) names(tt) - c(x1, x2) bwplot(x2 ~x1, data = tt) ok now in sweave: \begin{figure}[H] \begin{center} echo=FALSE, fig=TRUE, height=5, width=10= lset(col.whitebg()) bwplot(x2 ~x1, data = tt) @ \caption{xxx} \end{center} \end{figure} PROBLEM: the pdf of the figure is not correctly created (neither the esp) and the error I get from sweave is: pdf inclusion: required page does not exist 0 thanks for help christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] scan html: sep = td
Hi I try to import html text and I need to split the fields at each td or /td entry How can I succeed? sep = 'td' doens't yield the right result thanks for hints __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aov or t-test applied on all variables of a data.frame
Hi I have a data.frame with say 10 continuous variables and one grouping factor (say 3 levels) how can I easily (without loops) apply for each continous variable e.g. an aov, with the grouping factor as my factor (or if the grouping factor has 2 levels, eg. a t-test) thanks for a hint cheers christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aov or t-test applied on all variables of a data.frame
many thanks for the sapply hint. How can I use sapply for a compact result of the aov computation, say I call sapply(dd[-1], function(y, f) aov(y ~ f), f = dd$V1) aov gives the result in another form than t.test thanks a lot Peter Dalgaard wrote: Christoph Lehmann [EMAIL PROTECTED] writes: Hi I have a data.frame with say 10 continuous variables and one grouping factor (say 3 levels) how can I easily (without loops) apply for each continous variable e.g. an aov, with the grouping factor as my factor (or if the grouping factor has 2 levels, eg. a t-test) thanks for a hint Generally something with lapply or sapply, e.g. lapply(dd[-1], function(y) t.test(y~dd$V1)) $V2 Welch Two Sample t-test data: y by dd$V1 t = 1.5465, df = 39.396, p-value = 0.13 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.02500802 0.18764439 sample estimates: mean in group 1 mean in group 2 1.0968181.015500 ...etc, one for each of V2..V8 or, in a more compact form sapply(dd[-1], function(y) t.test(y~dd$V1))[1:3,] V2V3V4 V5 V6V7 statistic 1.546456 1.008719 0.08158578 -0.2456436 -0.872376 -1.405966 parameter 39.39554 36.30778 39.70288 36.99061 36.99944 35.97947 p.value 0.1299909 0.3197851 0.935386 0.807316 0.3886296 0.1683118 V8 statistic -0.6724112 parameter 29.65156 p.value 0.5065284 or (this'll get the confidence intervals and estimates printed sensibly). sapply(dd[-1], function(y)unlist(t.test(y~dd$V1)[1:5])) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] plot question
I have the following simple situation: tt - data.frame(c(0.5, 1, 0.5)) names(tt) - a plot(tt$a, type = 'o') gives the following plot ('I' and '.' represent the axis): I I I X I I I X X I... 1 2 3 what do I have to change to get the following: I I I X I I I X X I. 1 2 3 i.e. the plot-region should be widened at the left and right side thanks for a hint christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] matlab norm(h) command in R: sqrt(sum(h^2)) - use in an expression
Hi in matlab I defined a function (double gamma, parameters at the end of this mail) as h(i)=((t/d1)^a1)*exp(-(t-d1)/b1)-c*((t/d2)^a2)*exp(-(t-d2)/b2); h=h/norm(h); I do know that norm() in matlab is equal to: sqrt(sum(x^2)) in R so in R I do it like: #function (double gamama) h - expression((t/d1)^a1*exp(-(t-d1)/b1)-c*(t/d2)^a2*exp(-(t-d2)/b2)) # plot it t - seq(0, 2, by = 100) t - t/1000 plot(eval(h), type = 'l') # however this yields an error h - h/sqrt(sum(h^2)) Error in h^2 : non-numeric argument to binary operator what shall I do to get the matlab: h = h/norm(h) implemented in R? thanks for a hint christoph # parameters peak1 - 5.4 fwhm1 - 5.2 peak2 - 10.8 fwhm2 - 7.35 dip - 0.35 b1 - 0.9 # dispersion b2 - 0.9 #dispersion a1 - peak1/b1 a2 - peak2/b2 d1 - a1*b1 d2 - a2*b2 c - dip __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] D(eval(g)) problem, since Function `eval' is not in the derivatives table
thanks Andy and Dimitris for your reply to my expression/eval - problem starting with the resulting expression g I need g's derivative as expression, but I get: Function `eval' is not in the derivatives table: #function (double gamama) h - expression((t/d1)^a1*exp(-(t-d1)/b1)-c*(t/d2)^a2*exp(-(t-d2)/b2)) # plot it t - seq(0, 2, by = 100) t - t/1000 g - expression(eval(h)/sqrt(sum(eval(h)^2))) plot(eval(g), type = 'l') g.deriv - D(g, t) Error in D(g, t) : Function `eval' is not in the derivatives table is there any way one can solve this problem? thanks a lot christoph ## -- ## parameters peak1 - 5.4 fwhm1 - 5.2 peak2 - 10.8 fwhm2 - 7.35 dip - 0.35 b1 - 0.9 # dispersion b2 - 0.9 #dispersion a1 - peak1/b1 a2 - peak2/b2 d1 - a1*b1 d2 - a2*b2 c - dip __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R on solaris 10 on a 64-bit ultra60 sparc
Hi since I have no experience with solaris and sparc-architecture. We installed the latest Solaris 10 on a 64-bit ultra60 sparc machine. Since the solaris 10 is said to run native linux-applications: can I just download any r-binaries for linux? if yes, for which distribution? or are there any solaris 10 binaries? or do I need to compile R myself on the sparc? sorry for this beginner-question. but I am grateful for any small hint cheers christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] label outliers in boxplot and/or bwplot
Hi Is there a way to lable (e.g. observation-number) the outliers in a boxplot? and in a bwplot? thanks a lot Christoph P.S. identify() is not available with bwplot, is it? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Handling large data sets via scan()
does it solve to a part your problem, if you use read.table() instead of scan, since it imports data directly to a data.frame? let me know, if it helps Nawaaz Ahmed wrote: I'm trying to read in datasets with roughly 150,000 rows and 600 features. I wrote a function using scan() to read it in (I have a 4GB linux machine) and it works like a charm. Unfortunately, converting the scanned list into a datafame using as.data.frame() causes the memory usage to explode (it can go from 300MB for the scanned list to 1.4GB for a data.frame of 3 rows) and it fails claiming it cannot allocate memory (though it is still not close to the 3GB limit per process on my linux box - the message is unable to allocate vector of size 522K). So I have three questions -- 1) Why is it failing even though there seems to be enough memory available? 2) Why is converting it into a data.frame causing the memory usage to explode? Am I using as.data.frame() wrongly? Should I be using some other command? 3) All the model fitting packages seem to want to use data.frames as their input. If I cannot convert my list into a data.frame what can I do? Is there any way of getting around this? Much thanks! Nawaaz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[Fwd: Re: [R] vectorization of a data-aggregation loop]
great! many thanks, Phil Cheers christoph Phil Spector wrote: Christoph - I think reshape is the function you're looking for: tt - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), + c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 'c', + 'a', 'b', 'a', 'b', 'c'))) reshape(aggregate(as.numeric(tt$iwv),list(id=tt$id,type=tt$type),sum),idvar=id,timevar=type,direction=wide) id x.a x.b x.c 1 1 6 13 6 2 2 10 NA 7 3 3 9 14 7 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley [EMAIL PROTECTED] On Tue, 1 Feb 2005, Christoph Lehmann wrote: Hi I have a simple question: the following data.frame id iwv type 1 1 1a 2 1 2b 3 1 11b 4 1 5a 5 1 6c 6 2 4c 7 2 3c 8 2 10a 9 3 6b 10 3 9a 11 3 8b 12 3 7c shall be aggregated into the form: id t.a t.b t.c 1 1 6 13 6 6 2 10 0 7 9 3 9 14 7 means for each 'type' (a, b, c) a new column is introduced which gets the sum of iwv for the respective observations 'id' of course I can do this transformation/aggregation in a loop (see below), but is there a way to do this more efficiently, eg. in using tapply (or something similar)- since I have lot many rows? thanks for a hint christoph #-- # the loop-way t - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 'c', 'a', 'b', 'a', 'b', 'c'))) names(t) - c(id, iwv, type) t$iwv - as.numeric(t$iwv) t # define the additional columns (type.a, type.b, type.c) tt - rep(0, nrow(t) * length(levels(t$type))) dim(tt) - c(nrow(t), length(levels(t$type))) tt - data.frame(tt) dimnames(tt)[[2]] - paste(t., levels(t$type), sep = ) t - cbind(t, tt) t obs - 0 obs.previous - 0 row.elim - rep(FALSE, nrow(t)) ta - which((names(t) == t.a)) #number of column which codes the first type r.ctr - 0 for (i in 1:nrow(t)){ obs - t[i,]$id if (obs == obs.previous) { row.elim[i] - TRUE r.ctr - r.ctr + 1 #increment type.col - as.numeric(t[i,]$type) t[i - r.ctr, ta - 1 + type.col] - t[i - r.ctr, ta - 1 + type.col] + t[i,]$iwv } else { r.ctr - 0 #record counter type.col - as.numeric(t[i,]$type) t[i, ta - 1 + type.col] - t[i,]$iwv } obs.previous - obs } t - t[!row.elim,] t - subset(t, select = -c(iwv, type)) t #-- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] vectorization of a data-aggregation loop
Hi I have a simple question: the following data.frame id iwv type 1 1 1a 2 1 2b 3 1 11b 4 1 5a 5 1 6c 6 2 4c 7 2 3c 8 2 10a 9 3 6b 10 3 9a 11 3 8b 12 3 7c shall be aggregated into the form: id t.a t.b t.c 1 1 6 13 6 6 2 10 0 7 9 3 9 14 7 means for each 'type' (a, b, c) a new column is introduced which gets the sum of iwv for the respective observations 'id' of course I can do this transformation/aggregation in a loop (see below), but is there a way to do this more efficiently, eg. in using tapply (or something similar)- since I have lot many rows? thanks for a hint christoph #-- # the loop-way t - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 'c', 'a', 'b', 'a', 'b', 'c'))) names(t) - c(id, iwv, type) t$iwv - as.numeric(t$iwv) t # define the additional columns (type.a, type.b, type.c) tt - rep(0, nrow(t) * length(levels(t$type))) dim(tt) - c(nrow(t), length(levels(t$type))) tt - data.frame(tt) dimnames(tt)[[2]] - paste(t., levels(t$type), sep = ) t - cbind(t, tt) t obs - 0 obs.previous - 0 row.elim - rep(FALSE, nrow(t)) ta - which((names(t) == t.a)) #number of column which codes the first type r.ctr - 0 for (i in 1:nrow(t)){ obs - t[i,]$id if (obs == obs.previous) { row.elim[i] - TRUE r.ctr - r.ctr + 1 #increment type.col - as.numeric(t[i,]$type) t[i - r.ctr, ta - 1 + type.col] - t[i - r.ctr, ta - 1 + type.col] + t[i,]$iwv } else { r.ctr - 0 #record counter type.col - as.numeric(t[i,]$type) t[i, ta - 1 + type.col] - t[i,]$iwv } obs.previous - obs } t - t[!row.elim,] t - subset(t, select = -c(iwv, type)) t #-- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] several boxplots or bwplots into one graphic
many thanks for your great tip. I didn't know reshape. Unfortunately in my real data, the values of my variables are ont all within the same range. Therefore, what shall I change in the code to get for each plot a scale, which is adjusted to the range of my variable? thanks a lot christoph Chuck Cleland wrote: Your variables (var.*) seem to be on the same scale. How about reshaping the data into a univariate layout and then using bwplot as follows: mydata - data.frame(ID = 1:20, A = runif(20), B = runif(20), C = runif(20), GROUP = rep(c(0,1), c(10,10))) mydata.uni - reshape(mydata, varying = list(c(A, B, C)), v.names = Y, timevar = VAR, times = c(A, B, C), direction = long) library(lattice) bwplot(Y ~ as.factor(GROUP) | VAR, data = mydata.uni, layout=c(3,1,1), xlab=Group) hope this helps, Chuck Cleland Christoph Lehmann wrote: Hi I have 10 variables and 2 groups. I know how to plot a bwplot for ONE var. e.g. var.a var.b var.c .. GROUP 0.2 0.5 0.2 .. 0 0.3 0.2 0.2 .. 0 .. 0.1 0.8 0.7 .. 1 0.5 0.5 0.1 .. 1 .. bwplot(var.a ~ GROUP, data = my.data) How can I plot 10 bwplots (or boxplots) automatically into one graphic? is there any function from lattice which can do this? thanks for a short hint christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] several boxplots or bwplots into one graphic
Hi I have 10 variables and 2 groups. I know how to plot a bwplot for ONE var. e.g. var.a var.b var.c .. GROUP 0.2 0.5 0.2 .. 0 0.3 0.2 0.2 .. 0 .. 0.1 0.8 0.7 .. 1 0.5 0.5 0.1 .. 1 .. bwplot(var.a ~ GROUP, data = my.data) How can I plot 10 bwplots (or boxplots) automatically into one graphic? is there any function from lattice which can do this? thanks for a short hint christoph __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] classification for huge datasets: SVM yields memory troubles
Hi I have a matrix with 30 observations and roughly 3 variables, each obs belongs to one of two groups. With svm and slda I get into memory troubles ('cannot allocate vector of size' roughly 2G). PCA LDA runs fine. Are there any way to use the memory issue withe SVM's? Or can you recommend any other classification method for such huge datasets? P.S. I run suse 9.1 on a 2G RAM PIV machine. thanks for a hint Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] About ROC curves
package ROC from bioconductor, eg: http://www.bioconductor.org/repository/release1.5/package/Win32/ Cheers! Christoph Xin Qi wrote: Hi, Dear all R users: Does someone know whether R can calculate the Receiver Operating Characteristic (ROC) Curves? I didn't find it from the packages. Thanks a lot. --- Xin __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] LDA with previous PCA for dimensionality reduction
Dear all, not really a R question but: If I want to check for the classification accuracy of a LDA with previous PCA for dimensionality reduction by means of the LOOCV method: Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA with the CV option set to TRUE (runs LOOCV) -- OR-- do I need - to compute for each 'test-bag' (the n-1 observations) a PCA (my.princomp.1), - then run the LDA on the test-bag scores (- my.lda.1) - then compute the scores of the left-out-observation using my.princomp.1 (- my.scores.2) - and only then use predict.lda(my.lda.1, my.scores.2) on the scores of the left-out-observation ? I read some articles, where they choose procedure 1, but I am not sure, if this is really correct? many thanks for a hint Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] LDA with previous PCA for dimensionality reduction
Thank you, Torsten; that's what I thought, as long as one does not use the 'class label' as a constraint in the dimension reduction, the procedure is ok. Of course it is computationally more demanding, since for each new (unknown in respect of the class label) observation one has to compute a new PCA as well. Cheers Christoph Torsten Hothorn wrote: On Wed, 24 Nov 2004, Ramon Diaz-Uriarte wrote: Dear Cristoph, I guess you want to assess the error rate of a LDA that has been fitted to a set of currently existing training data, and that in the future you will get some new observation(s) for which you want to make a prediction. Then, I'd say that you want to use the second approach. You might find that the first step turns out to be crucial and, after all, your whole subsequent LDA is contingent on the PC scores you obtain on the previous step. Ramon, as long as one does not use the information in the response (the class variable, in this case) I don't think that one ends up with an optimistically biased estimate of the error (although leave-one-out is a suboptimal choice). Of course, when one starts to tune the method used for dimension reduction, a selection of the procedure with minimal error will produce a bias. Or am I missing something important? Btw, `ipred::slda' implements something not completely unlike the procedure Christoph is interested in. Best, Torsten Somewhat similar issues have been discussed in the microarray literature. Two references are: @ARTICLE{ambroise-02, author = {Ambroise, C. and McLachlan, G. J.}, title = {Selection bias in gene extraction on the basis of microarray gene-expression data}, journal = {Proc Natl Acad Sci USA}, year = {2002}, volume = {99}, pages = {6562--6566}, number = {10}, } @ARTICLE{simon-03, author = {Simon, R. and Radmacher, M. D. and Dobbin, K. and McShane, L. M.}, title = {Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification}, journal = {Journal of the National Cancer Institute}, year = {2003}, volume = {95}, pages = {14--18}, number = {1}, } I am not sure, though, why you use PCA followed by LDA. But that's another story. Best, R. On Wednesday 24 November 2004 11:16, Christoph Lehmann wrote: Dear all, not really a R question but: If I want to check for the classification accuracy of a LDA with previous PCA for dimensionality reduction by means of the LOOCV method: Is it ok to do the PCA on the WHOLE dataset ONCE and then run the LDA with the CV option set to TRUE (runs LOOCV) -- OR-- do I need - to compute for each 'test-bag' (the n-1 observations) a PCA (my.princomp.1), - then run the LDA on the test-bag scores (- my.lda.1) - then compute the scores of the left-out-observation using my.princomp.1 (- my.scores.2) - and only then use predict.lda(my.lda.1, my.scores.2) on the scores of the left-out-observation ? I read some articles, where they choose procedure 1, but I am not sure, if this is really correct? many thanks for a hint Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] biplot.princomp with loadings only
Hi is there a way to plot only the loadings in a biplot (with the nice arrows), and to skip the scores? thanks christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] glm.fit and predict.glm: error ' no terms component'
Hi when I fit a glm by glm.fit(x,y,family = binomial()) and then try to use the object for prediction of newdata by: predict.glm(object, newdata) I get the error: Error in terms.default(object) : no terms component I know I can use glm() and a formula, but for my case I prefer glm.fit(x,y)... thanks for a hint christoph $platform [1] i686-pc-linux-gnu $arch [1] i686 $os [1] linux-gnu $system [1] i686, linux-gnu $status [1] $major [1] 1 $minor [1] 9.1 $year [1] 2004 $month [1] 06 $day [1] 21 $language [1] R __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] glm.fit and predict.glm: error ' no terms component'
many thanks I did it the following way, based on Thomas' suggestion predict.glm.fit-function(glmfit, newmatrix){ newmatrix-cbind(1,newmatrix) coef - rbind(1, as.matrix(glmfit$coef)) eta - as.matrix(newmatrix) %*% as.matrix(coef) exp(eta)/(1 + exp(eta)) } cheers christoph Thomas Lumley wrote: On Wed, 29 Sep 2004, Christoph Lehmann wrote: Hi when I fit a glm by glm.fit(x,y,family = binomial()) and then try to use the object for prediction of newdata by: predict.glm(object, newdata) I get the error: Error in terms.default(object) : no terms component I know I can use glm() and a formula, but for my case I prefer glm.fit(x,y)... Well, you can't use predict.glm that way. As the function name suggests, it is a predict method for objects of class glm, which in your case you do not have. There are two reasons why it won't work. For type=terms the formula is needed to identify terms, and for any type of prediction the formula is needed to convert the data frame newdata into a model matrix. You would need to write a function where the new data was a model matrix. If you only need point predictions then predict_glm_fit-function(glmfit, newmatrix, addintercept=TRUE){ if (addintercept) newmatrix-cbind(1,newmatrix) eta-glmfit$coef %*% newmatrix family$linkinv(eta) } would work. -thomas __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nnet with weights parameter: odd error
Dear R-users I use nnet for a classification (2 classes) problem. I use the code CVnn1, CVnn2 as described in VR. The thing I changed to the code is: I define the (class) weight for each observation in each cv 'bag' and give the vector of weights as parameter of nnet(..weights = weight.vector...) Unfortunately I get an error during some (but not all!) inner-fold cv runs: Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : variable lengths differ If you just remove the weights parameter in nnet() it runs fine!! I debugged the code but could not resolve the problem- it is really very strange and I need your help! I tried it very simple in defining the weights as = 1 for each obs (as it is by default)!: CVnn2 - function(formula, data, size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001, 0.01),2)), nreps = 1, nifold = 5, verbose = 99, ...) { resmatrix - function(predict.matrix, learn, data, ri, i) { rae.matrix - predict.matrix rae.matrix[,] - 0 rae.vector - as.numeric(as.factor((predict(learn, data[ri == i,], type = class for (k in 1:dim(rae.matrix)[1]) { if (rae.vector[k] == 1) rae.matrix[k,1] - rae.matrix[k,1] + 1 else rae.matrix[k,2] - rae.matrix[k,2] + 1 } rae.matrix } CVnn1 - function(formula, data, nreps=1, ri, verbose, ...) { totalerror - 0 truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) if(verbose 20) cat( inner fold) for (i in sort(unique(ri))) { if(verbose 20) cat( , i, sep=) data.training - data[ri != i,]$GROUP weight.vector - rep(1, dim(data[ri !=i,])[1]) for(rep in 1:nreps) { learn - nnet(formula, data[ri !=i,], weights = weight.vector, trace = F, ...) #res[ri == i,] - res[ri == i,] + predict(learn, data[ri == i,]) res[ri == i,] - res[ri == i,] + resmatrix(res[ri == i,], learn, data, ri, i) } } if(verbose 20) cat(\n) sum(as.numeric(truth) != max.col(res/nreps)) } truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) choice - numeric(length(lambda)) for (i in sort(unique(rand))) { if(verbose 0) cat(fold , i,\n, sep=) set.seed(i*i) ri - sample(nifold, sum(rand!=i), replace=T) for(j in seq(along=lambda)) { if(verbose 10) cat( size =, size[j], decay =, lambda[j], \n) choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps, ri=ri, size=size[j], decay=lambda[j], verbose=verbose, ...) } decay - lambda[which.is.max(-choice)] csize - size[which.is.max(-choice)] if(verbose 5) cat( #errors:, choice, ) # if(verbose 1) cat(chosen size = , csize, decay = , decay, \n, sep=) for(rep in 1:nreps) { data.training - data[rand != i,]$GROUP weight.vector - rep(1, dim(data[rand !=i,])[1]) learn - nnet(formula, data[rand != i,], weights = weight.vector, trace=F, size=csize, decay=decay, ...) #res[rand == i,] - res[rand == i,] + predict(learn, data[rand == i,]) res[rand == i,] - res[rand == i,] + resmatrix(res[rand == i,],learn,data, rand, i) } } factor(levels(truth)[max.col(res/nreps)], levels = levels(truth)) } res.nn2 - CVnn2(GROUP ~ ., rae.data.subsetted1, skip = T, maxit = 500, nreps = cv.repeat) con(true = rae.data.subsetted$GROUP, predicted = res.nn2) ### Coordinates: platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major1 minor9.1 year 2004 month06 day 21 language R Thanks a lot Best regards Christoph -- Christoph LehmannPhone: ++41 31 930 93 83 Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00 University Hospital of Clinical Psychiatry Fax:++41 31 930 99 61 Waldau[EMAIL PROTECTED] CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nnet and weights: error analysis using VR example
Dear R-users, dear Prof. Ripley as package maintainer I tried to investigate the odd error, when I call nnet together with a 'weights' parameter, using the 'fgl' example in VR p 348 The error I get is: Error in eval(expr, envir, enclos) : Object w not found I think it is a kind of scoping problem, but I really cannot see, what the problem exactly is. and here is my code: the only thing which changed is the definition of a weight-parameter ('w') which is given to the nnet-call. Of course the weight vector with '1's makes no sense here, but it will be defined according to the class sizes later. ### library(MASS) data(flg) con - function(...) { print(tab - table(...)) diag(tab) - 0 cat(error rate = , round(100*sum(tab)/length(list(...)[[1]]), 2), %\n) invisible() } set.seed(123) rand - sample(10, dim(fgl)[1], replace = T) fgl1 - fgl fgl1[1:9] - lapply(fgl[, 1:9], function(x) {r - range(x); (x - r[1])/diff(r)}) CVnn2 - function(formula, data, size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001, 0.01),2)), nreps = 1, nifold = 5, verbose = 99, ...) { CVnn1 - function(formula, data, nreps=1, ri, verbose, ...) { totalerror - 0 truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) if(verbose 20) cat( inner fold) for (i in sort(unique(ri))) { if(verbose 20) cat( , i, sep=) data.training - data[ri != i,]$GROUP w - rep(1, dim(data[ri !=i,])[1]) for(rep in 1:nreps) { learn - nnet(formula, data[ri !=i,], weights = w, trace = F, ...) res[ri == i,] - res[ri == i,] + predict(learn, data[ri == i,]) } } if(verbose 20) cat(\n) sum(as.numeric(truth) != max.col(res/nreps)) } truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) choice - numeric(length(lambda)) for (i in sort(unique(rand))) { if(verbose 0) cat(fold , i,\n, sep=) set.seed(i*i) ri - sample(nifold, sum(rand!=i), replace=T) for(j in seq(along=lambda)) { if(verbose 10) cat( size =, size[j], decay =, lambda[j], \n) choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps, ri=ri, size=size[j], decay=lambda[j], verbose=verbose, ...) } decay - lambda[which.is.max(-choice)] csize - size[which.is.max(-choice)] if(verbose 5) cat( #errors:, choice, ) # if(verbose 1) cat(chosen size = , csize, decay = , decay, \n, sep=) for(rep in 1:nreps) { data.training - data[rand != i,]$GROUP w - rep(1, dim(data[rand !=i,])[1]) learn - nnet(formula, data[rand != i,], weights = w, trace=F, size=csize, decay=decay, ...) res[rand == i,] - res[rand == i,] + predict(learn, data[rand == i,]) } } factor(levels(truth)[max.col(res/nreps)], levels = levels(truth)) } res.nn2 - CVnn2(type ~ ., fgl1, skip = T, maxit = 500, nreps = 10) con(true = fgl$type, predicted = res.nn2) ## many thanks for your help Christoph ### Coordinates: platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major1 minor9.1 year 2004 month06 day 21 language R -- Christoph LehmannPhone: ++41 31 930 93 83 Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00 University Hospital of Clinical Psychiatry Fax:++41 31 930 99 61 Waldau[EMAIL PROTECTED] CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] memory problem under windows
I have (still) some memory problems, when trying to allocate a huge array: WinXP pro, with 2G RAM I start R by calling: Rgui.exe --max-mem-size=2Gb (as pointed out in R for windows FAQ) R.Version(): i386-pc-mingw32, 9.1, 21.6.2004 ## and here the problem x.dim - 46 y.dim - 58 slices - 40 volumes - 1040 a - rep(0, x.dim * y.dim * slices * volumes) dim(a) - c(x.dim, y.dim, slices, volumes) gives me: Error: cannot allocate vector of size 850425 Kb even though memory.limit(size = NA) yields 2147483648 and memory.size() gives 905838768 so why is that and what can I do against it? Many thanks for your kind help Cheers Christoph -- Christoph LehmannPhone: ++41 31 930 93 83 Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00 University Hospital of Clinical Psychiatry Fax:++41 31 930 99 61 Waldau[EMAIL PROTECTED] CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Variable Importance in pls: R or B? (and in glpls?)
Dear R-users, dear Ron I use pls from the pls.pcr package for classification. Since I need to know which variables are most influential onto the classification performance, what criteria shall I look at: a) B, the array of regression coefficients for a certain model (means a certain number of latent variables) (and: squared or absolute values?) OR b) the weight matrix RR (or R in the De Jong publication; in Ding Gentleman this is the P Matrix and called 'loadings')? (and again: squared or absolute values?) and what about glpls (glpls1a) ? shall I look at the 'coefficients' (regression coefficients)? Thanks for clarification Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] locator() in a multi-figure setting using mfrow(): SOLVED
thanks to some great hints by Paul Murrel I could solve it: here we are with one solution (code needs to be cleaned and simplified, but maybe one can understand it) ### ## create a multifigure setting nr - 4 nc - 2 opar - par(mfrow = c(nr, nc)) slices - 8 m - matrix(runif(100),10,10) my.list - list() for (slice in 1:slices) { my.list[[slice]] - m } for (slice in 1:slices) { x - 1*(1:25) y - 1*(1:25) z - my.list[[slice]] image(list(x = 0:9, y = 0:9, z = z)) } my.get.coord - function() { par(mfg = c(1,1)) #locator() shall be relative to the first plot out # of the eight plots totally my.loc -locator(1) #location, not in inches my.plot.region - par(usr) #extremes of plotting region #(in plot units, not inches) my.plot.region.x - my.plot.region[2] - my.plot.region[1] my.plot.region.y - my.plot.region[4] - my.plot.region[3] my.loc.inch.x - (my.loc$x + 0.5)/my.plot.region.x * (par(pin)[1]) #par(pin) #current plot dimension in inches #relative to the plotting-region bottom left corner, not the axis c(0,0) point my.loc.inch.y - (my.loc$y + 0.5)/my.plot.region.y * (par(pin)[2]) ## search the plot we are in with locator(1) my.plot.inch.x - par(pin)[1] + par(mai)[2] + par(mai)[4] #plot.x + left right margin par(fin)[1] my.plot.inch.y - par(pin)[2] + par(mai)[1] + par(mai)[3] #plot.y + bottom top margin par(fin)[2] pos.rel.x - (my.loc.inch.x / par(fin)[1] - floor(my.loc.inch.x / par(fin)[1])) * par(fin)[1] / par(pin)[1] * (par(usr)[2] - par(usr)[1]) - 0.5 #inches from left bottom corner in target plot region (c(0,0) # is plot-region bottom-left corner, not the axis c(0,0) point pos.rel.y - (my.loc.inch.y / par(fin)[2] - floor(my.loc.inch.y / par(fin)[2])) * par(fin)[2] / par(pin)[2] * (par(usr)[4] - par(usr)[3]) - 0.5 #inches from left bottom corner in target plot fig.coord.x - ceiling(my.loc.inch.x / par(fin)[1]) fig.coord.y - 1 +(-1) *ceiling(my.loc.inch.y / par(fin)[2]) # cat(figure-coord x: , fig.coord.x,\n) # cat(figure-coord y: , fig.coord.y,\n) cat(we are in figure: , fig.coord.y * nc + fig.coord.x, \n) cat(coordinates of the identified point x: , round(pos.rel.x),\n) cat(coordinates of the identified point y: , round(pos.rel.y),\n) } my.get.coord() Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] locator() in a multi-figure setting using mfrow()
Hi based on some code from Thomas Petzoldt (see below), I have a question, about how to use locator() in a mfrow() multi-figure setting. I am sure this should be a kind of problem, which other people face too? we have 8 matrices each 10x10 fields, plotted as mfrow = c(2,4). how can I get, using locator(), the correct index of a field of one of the 8 matrices? means, I should get 3 values: the matrix I am in (a value between 1..8), and the corresponding x and y coordinates (each in 1..10) many, thanks for your kind help. --- opar - par(mfrow = c(2,4)) slices - 8 m - matrix(runif(100),10,10) my.list - list() for (slice in 1:slices) { my.list[[slice]] - m } for (slice in 1:slices) { x - 1*(1:25) y - 1*(1:25) z - my.list[[slice]] image(list(x = 0:9, y = 0:9, z = z)) } par(opar) #restore device parameters p - locator(1) c(round(p$x), round(p$y)) --- how can I get the correct location in the sense of a 3d info: (a) which slice (p$slice) (b) p$x (c) p$y so that it could be used in the sense of: my.list[[p$slice]][round(p$x), round(p$y)] christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] locator() in a multi-figure setting using mfrow()
I know, that I can use par(mfg = c(i,u)) to get the correct x,y coordinates of one of the 8 matrices/subimages, but how can I get the i and the j, means how can I know in which of the 8 images I am clicking in? thanks Christoph Christoph Lehmann wrote: Hi based on some code from Thomas Petzoldt (see below), I have a question, about how to use locator() in a mfrow() multi-figure setting. I am sure this should be a kind of problem, which other people face too? we have 8 matrices each 10x10 fields, plotted as mfrow = c(2,4). how can I get, using locator(), the correct index of a field of one of the 8 matrices? means, I should get 3 values: the matrix I am in (a value between 1..8), and the corresponding x and y coordinates (each in 1..10) many, thanks for your kind help. --- opar - par(mfrow = c(2,4)) slices - 8 m - matrix(runif(100),10,10) my.list - list() for (slice in 1:slices) { my.list[[slice]] - m } for (slice in 1:slices) { x - 1*(1:25) y - 1*(1:25) z - my.list[[slice]] image(list(x = 0:9, y = 0:9, z = z)) } par(opar) #restore device parameters p - locator(1) c(round(p$x), round(p$y)) --- how can I get the correct location in the sense of a 3d info: (a) which slice (p$slice) (b) p$x (c) p$y so that it could be used in the sense of: my.list[[p$slice]][round(p$x), round(p$y)] christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] huge array with floats: allocation error
Hi After searching through a couples of documents and the mailing list I dare to ask it here I need to define an array with the size 64 x 64 x 16 x 1000 for single-precision floating-point numbers. With 1G RAM I get always the error: cannot allocate vector of size 458752 Kb reached total allocation of 1022MB: see help(memory.size) I consulted memory.size() but it didn't help me. so my question: I know that there is NO float type in R. Is there any way to solve my problem, without increasing the RAM? many thanks Cheers! Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] huge array with floats: allocation error
Thank you, Prof. Ripley I don't believe you read the rw-FAQ as the posting guide asks, though. You seem to be working under Windows, without saying so (and the posting guide does ask you to). So that's `a couples of documents' worth `searching through'. I apologize for not being more precise. Fact is: x - rep(0, 64 * 64 * 16 * 1000) dim(x) - c(64,64,16,1000) indeed DOES work on (I tried it on several machines): (i) Linux Suse9.1 box with 1G RAM (ii) Windows WinXp with 1G RAM (iii) Windows WinXp with 2G RAM all with R Version 1.9.1 but: tst.array - array(0, c(64, 64, 16, 1000)) Error: cannot allocate vector of size 512000 Kb does not work on none of these machines/RAM combinations why is this? many thanks for a further hint. I am sure I overlooked something very basic- forgive me. Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] association rules in R
Hi I am interested in data mining problems. Has anybody ever programmed and worked with association rules in R? I am very grateful for any hint. Best regards Christoph __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] vectorizing a matrix computation
Dear R users I have a 4-dimensional matrix (actually several 3d (x,y, slices) matrices appended over time (volumes)) say, e.g. I want to z-transform the data (subtract the mean and divide by the std-deviation) for (slice in 1:slices) { for (x in 1:x.dim) { for (y in 1:y.dim) { t - as.matrix(my.matrix[x,y,slice,1:volumes]) for (vol in 1:volumes) { my.matrix.transformed[x,y,slice,vol] - (my.matrix[x,y,slice,vol] - mean(t))/sqrt(var(t)) } } } } how can I vectorize such a function using, one of the *apply functions? many thanks Cheers Christoph __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] multiple hierarchical anova models
Hi I can recommend you two files a) http://www.psych.upenn.edu/~baron/rpsych/rpsych.html b) http://www.pallier.org/ressources/stats_with_R/stats_with_R.pdf (in french) cheers let me know whether this helped you cheers christoph Matthias Unterhuber wrote: Hello, My name is Matthias and I do look for syntax regarding hierarchal anova models in R. How can I express that a factor is nested within the combination of two other factors A(B,C), e.g. for aov(...)? I did not find the corresponding expression. Furthermore, I wanted to ask whether block factors have to be specified in a specific way or are they just treated as other factors (with no interactions). Furthermore, in general an overview might be useful for beginners that describes the structural equations of more complicated anova-designs (hierarchical and block factor designs...) in the syntax of R. Best wishes and thanks, Matthias __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] locator() in a multiple plot setting
Hi based on some code from Thomas Petzoldt, I have a question: --- opar - par(mfrow = c(2,4)) slices - 8 m - matrix(runif(100),10,10) my.list - list() for (slice in 1:slices) { my.list[[slice]] - m } for (slice in 1:slices) { x - 1*(1:25) y - 1*(1:25) z - my.list[[slice]] image(list(x = 0:9, y = 0:9, z = z)) } par(opar) #restore device parameters p - locator(1) c(round(p$x), round(p$y)) --- how can I get the correct location in the sense of a 3d info: (a) which slice (p$slice) (b) p$x (c) p$y so that it could be used in the sense of: my.list[[p$slice]][round(p$x), round(p$y)] many thanks Cheers christoph __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] paired t-test with bootstrap
just a hint for further bootstrapping examples (worked out with R): Bootstrap Methods and Their Applications by A.C. Davison and D.V. Hinkley cheers christoph luciana wrote: Dear Sirs, I am a R beginning user: by mean of R I would like to apply the bootstrap to my data in order to test cost differences between independent or paired samples of people affected by a certain disease. My problem is that even if I am reading the book by Efron (introduction to the bootstrap), looking at the examples in internet and available in R, learning a lot of theoretical things on bootstrap, I can't apply bootstrap with R to my data because of many doubts and difficulties. This is the reason why I have decided to ask the expert for help. I have a sample of diabetic people, matched (by age and sex) with a control sample. The variable I would like to compare is their drug and hospital monthly cost. The variable cost has a very far from gaussian distribution, but I need any way to compare the mean between the two group. So, in the specific case of a paired sample t-test, I aim at testing if the difference of cost is close to 0. What is the better way to follow for that? Another question is that sometimes I have missing data in my dataset (for example I have the cost for a patients but not for a control). If I introduce NA or a dot, R doesn't estimate the statistic I need (for instance the mean). To overcome this problem I have replaced the missing data with the mean computed with the remaining part of data. Anyway, I think R can actually compute the mean even with the presence of missing data. Is it right? What can I do? Thank you very much for your attention and, I hope, your help. Best wishes Luciana Scalone Center of Pharmacoeconomics University of Milan [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] pixmapIndexed color question
Hi I use pixmapIndexed tmp.vimp - array(0,c(x.dim,y.dim)) tmp.vimp - pixmapIndexed(tmp.vimp, col=rainbow) to plot values of a 2D matrix. I 'fill' the pixmapIndexed like: for (x in 1:x.dim) { for (y in 1:y.dim) { [EMAIL PROTECTED],y] - my.matrix[x,y] }} how can I define, that the colors are painted e.g. according the rainbow palette? plot(tmp.vimp) paints all 'pixels' in red even though I specified it with col=rainbow (see above) many thanks cheers christoph p.s. is there an easier method for 'painting' the values of a 2d matrix? __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] image NAs error
tmp.vimp - matrix(NA, nrow = x.dim, ncol = y.dim) tmp.vimp - image(tmp.vimp, col=rainbow) gives: Error in image.default(tmp.vimp, col = rainbow) : invalid z limits In addition: Warning messages: 1: no finite arguments to min; returning Inf 2: no finite arguments to max; returning -Inf even though NAs are allowed in image what went wrong here? thank you Christoph __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] sweave: graphics not at the expected location in the pdf
Hi I use sweave for excellent pdf output (thank you- Friedrich Leisch). I have just one problem. Quite often it happens, that the graphics are not at the place where I expect them, but (often on a separate page) later on in the pdf. How can I fix this, means how can I define, that I want a graphic exactly here and now in the document? Many thanks and best regards Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Visual stimulus presentation using R?
Hi Christoph I have never done such stuff with R (and I don't know, e.g. how good the timing would be, in case you are interested in reaction times and so on) but have a look at www.visionegg.org it's a stimulus-presentation framework, written by Andrew Straw entirely in python and has lots of great features. cheers Christoph P.S please let me know if you succeed with a solution in R! On Mon, 2004-06-21 at 15:27, Christoph Lange wrote: Dear all! Although the Psycho-Toolbox for Matlab is free software, Matlab isn't. I'm planning to do an experiment where it's essentail to travel to the subjects, not let the subjects come to where the Matlab licences are :-( So I need to use a free software for my experiment if I don't want to by an extra Matlab licence (which I don't want to). Did anyone ever try to do presentation of visual stimuli (images, practically, with a little bit of text in my case) with R? I looked into the documentation of rgl, but what's lacking there is (as far as I saw) the possibility to also read (unbuffered) keyboard input. So what I need is: 1. put images onto the (full!)screen (qick) 2. read keyboard input 3. write results (to an R structure, presumably) Any idea, suggestion? Cheers, Christoph. -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lme newbie question
Hi I try to implement a simple 2-factorial repeated-measure anova in the lme framework and would be grateful for a short feedback -my dependent var is a reaction-time (rt), -as dependent var I have -the age-group (0/1) the subject belongs to (so this is a between-subject factor), and -two WITHIN experimental conditions, one (angle) having 5, the other 3 (hands) factor-levels; means each subjects performs on 3 * 5 = 15 different task diffiulties Am I right in this lme implementation, when I want to investigate the influence of the age.group, and the two conditions on the rt: my.lme - lme(rt ~ age.group + angles * hands, data = my.data, random = ~ 1 |subject) then I think I would have to compare the model above with a more elaborated one, including more interactions: my.lme2 - lme(rt ~ age.group * angles * hands, data = my.data, random = ~ 1 |subject) and comparing them by performing a likelhood-ratio test, yes? I think, if I would like to generalize the influence of the experimental conditions on the rt I should define angles and hands as a random effect, yes? ? thanks for a short feedback. It seems, repeated-measures anova's aren't a trivial topic in R :) Cheers! Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] classification with nnet: handling unequal class sizes
Dear Prof. Ripley Since you are the creator of the MASS library I dare to ask you a short question, for which I didn't get an answer from the R mailing-list. If you feel disturbed by my question, please forgive me and just ignore my mail. I use the nnet code from your book, VR p. 348: The very nice and general function CVnn2() to choose the number of hidden units and the amount of weight decay by an inner cross-validation- with a slight modification to use it for classification (see below). My data has 2 classes with unequal size: 45 observations for classI and 116 obs. for classII (number of variables: 39) With CVnn2 I get the following confusion matrix (%) (average of 10 runs): predicted true53 47 16 84 I had a similar biased confusion matrix with randomForest until I used the sampsize argument (the same holds for svm until I used the class.weights argument). How can I handle this problem of unequal class sizes with nnet, in order to get a less biased confusion matrix? (with randomForest I finally got 78 22 16 84 ) many thanks for a hint. By the way, I just want to say 'thank you' for your great MASS book. Since your first recommendation, I consult it quite frequently. Christoph #--- neural networks #classification network is constructed; this has one output and entropy fit if the number of levels is two, and a number of outputs equal to the number of classes and a softmax output stage for more levels. - therefore two lines of Prof. Ripley's wrapper function are changed below (original commented out) and an additional function has been introduced (resmatrix) con - function(...) { print(tab - table(...)) diag(tab) - 0 cat(error rate = , round(100*sum(tab)/length(list(...)[[1]]), 2), %\n) invisible() } CVnn2 - function(formula, data, size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001, 0.01),2)), nreps = 1, nifold = 5, verbose = 99, ...) { resmatrix - function(predict.matrix,learn, data, ri, i) { rae.matrix - predict.matrix rae.matrix[,] - 0 rae.vector - as.numeric(as.factor((predict(learn, data[ri == i,], type = class for (k in 1:dim(rae.matrix)[1]) { if (rae.vector[k] == 1) rae.matrix[k,1] - rae.matrix[k,1] + 1 else rae.matrix[k,2] - rae.matrix[k,2] + 1 } rae.matrix } CVnn1 - function(formula, data, nreps=1, ri, verbose, ...) { totalerror - 0 truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) if(verbose 20) cat( inner fold) for (i in sort(unique(ri))) { if(verbose 20) cat( , i, sep=) for(rep in 1:nreps) { learn - nnet(formula, data[ri !=i,], trace = F, ...) #res[ri == i,] - res[ri == i,] + predict(learn, data[ri == i,]) res[ri == i,] - res[ri == i,] + resmatrix(res[ri == i,],learn,data, ri, i) } } if(verbose 20) cat(\n) sum(as.numeric(truth) != max.col(res/nreps)) } truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) choice - numeric(length(lambda)) for (i in sort(unique(rand))) { if(verbose 0) cat(fold , i,\n, sep=) ri - sample(nifold, sum(rand!=i), replace=T) for(j in seq(along=lambda)) { if(verbose 10) cat( size =, size[j], decay =, lambda[j], \n) choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps, ri=ri, size=size[j], decay=lambda[j], verbose=verbose, ...) } decay - lambda[which.is.max(-choice)] csize - size[which.is.max(-choice)] if(verbose 5) cat( #errors:, choice, ) # if(verbose 1) cat(chosen size = , csize, decay = , decay, \n, sep=) for(rep in 1:nreps) { learn - nnet(formula, data[rand != i,], trace=F, size=csize, decay=decay, ...) #res[rand == i,] - res[rand == i,] + predict(learn, data[rand == i,]) res[rand == i,] - res[rand == i,] + resmatrix(res[rand == i,],learn,data, rand, i) } } factor(levels(truth)[max.col(res/nreps)], levels = levels(truth)) } -- Christoph Lehmann [EMAIL PROTECTED] -- Christoph LehmannPhone: ++41 31 930 93 83 Department of Psychiatric NeurophysiologyMobile: ++41 76 570 28 00 University Hospital of Clinical Psychiatry Fax:++41 31 930 99 61 Waldau[EMAIL PROTECTED] CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_03.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do
[R] classification with nnet: handling unequal class sizes
I hope this question is adequate for this list I use the nnet code from VR p. 348: The very nice and general function CVnn2() to choose the number of hidden units and the amount of weight decay by an inner cross-validation- with a slight modification to use it for classification (see below). My data has 2 classes with unequal size: 45 observations for classI and 116 obs. for classII With CVnn2 I get the following confusion matrix (%) (average of 10 runs): predicted true53 47 16 84 I had a similar biased confusion matrix with randomForest until I used the sampsize argument (the same holds for svm until I used the class.weights argument). How can I handle this problem of unequal class sizes with nnet, in order to get a less biased confusion matrix? (with randomForest I finally got 78 22 16 84 ) many thanks for a hint Christoph #--- neural networks #classification network is constructed; this has one output and entropy fit if the number of levels is two, and a number of outputs equal to the number of classes and a softmax output stage for more levels. - therefore two lines of Prof. Ripley's wrapper function are changed below (original commented out) and an additional function has been introduced (resmatrix) con - function(...) { print(tab - table(...)) diag(tab) - 0 cat(error rate = , round(100*sum(tab)/length(list(...)[[1]]), 2), %\n) invisible() } CVnn2 - function(formula, data, size = c(0,4,4,10,10), lambda = c(0, rep(c(0.001, 0.01),2)), nreps = 1, nifold = 5, verbose = 99, ...) { resmatrix - function(predict.matrix,learn, data, ri, i) { rae.matrix - predict.matrix rae.matrix[,] - 0 rae.vector - as.numeric(as.factor((predict(learn, data[ri == i,], type = class for (k in 1:dim(rae.matrix)[1]) { if (rae.vector[k] == 1) rae.matrix[k,1] - rae.matrix[k,1] + 1 else rae.matrix[k,2] - rae.matrix[k,2] + 1 } rae.matrix } CVnn1 - function(formula, data, nreps=1, ri, verbose, ...) { totalerror - 0 truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) if(verbose 20) cat( inner fold) for (i in sort(unique(ri))) { if(verbose 20) cat( , i, sep=) for(rep in 1:nreps) { learn - nnet(formula, data[ri !=i,], trace = F, ...) #res[ri == i,] - res[ri == i,] + predict(learn, data[ri == i,]) res[ri == i,] - res[ri == i,] + resmatrix(res[ri == i,],learn,data, ri, i) } } if(verbose 20) cat(\n) sum(as.numeric(truth) != max.col(res/nreps)) } truth - data[,deparse(formula[[2]])] res - matrix(0, nrow(data), length(levels(truth))) choice - numeric(length(lambda)) for (i in sort(unique(rand))) { if(verbose 0) cat(fold , i,\n, sep=) ri - sample(nifold, sum(rand!=i), replace=T) for(j in seq(along=lambda)) { if(verbose 10) cat( size =, size[j], decay =, lambda[j], \n) choice[j] - CVnn1(formula, data[rand != i,], nreps=nreps, ri=ri, size=size[j], decay=lambda[j], verbose=verbose, ...) } decay - lambda[which.is.max(-choice)] csize - size[which.is.max(-choice)] if(verbose 5) cat( #errors:, choice, ) # if(verbose 1) cat(chosen size = , csize, decay = , decay, \n, sep=) for(rep in 1:nreps) { learn - nnet(formula, data[rand != i,], trace=F, size=csize, decay=decay, ...) #res[rand == i,] - res[rand == i,] + predict(learn, data[rand == i,]) res[rand == i,] - res[rand == i,] + resmatrix(res[rand == i,],learn,data, rand, i) } } factor(levels(truth)[max.col(res/nreps)], levels = levels(truth)) } -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] pca scores for newdata
Hi I used princomp on a dataset x[!sub,]. How can I get the scores for another dataset, say x[sub,]? I didn't succeed using predict() thanks for a hint cheers christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] PLS scores for newdata
Hi I used pls.pcr on a dataset x[!sub,]. How can I get the scores for another dataset, say x[sub,]? I couldn't reproduce the scores for the training set by multiplying the data with the loadings, even after first scaling the data (scale()). pls.pcr only gives the XScores for the training data. many thanks for a hint cheers christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] my own function given to lapply
Hi It seems, I just miss something. I defined treshold - function(pred) { if (pred 0.5) pred - 0 else pred - 1 return(pred) } and want to use apply it on a vector sapply(mylist[,,3],threshold) but I get: Error in match.fun(FUN) : Object threshold not found thanks for help cheers chris -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] nlme vs aov with Error() for an ANCOVA
Hi I compouted a multiple linear regression with repeated measures on one explanatory variable: BOLD peak (blood oxygenation) as dependent variable, and as independent variables I have: -age.group (binaray:young(0)/old(1)) -and task-difficulty measured by means of the reaction-time 'rt'. For 'rt' I have repeated measurements, since each subject did 12 different tasks. - so it can be seen as an ANCOVA subject age.group boldrt subj10 0.080.234 subj10 0.050.124 .. subj10 0.070.743 subj20 0.060.234 subj20 0.020.183 .. subj20 0.050.532 subjn1 0.090.234 subjn1 0.060.155 .. subjn1 0.070.632 I decided to use the nlme library: patrizia.lme - lme(bold ~ rt*age.group, data=patrizia.data1, random= ~ rt |subject) summary(patrizia.lme) Linear mixed-effects model fit by REML Data: patrizia.data1 AIC BIClogLik 272.2949 308.3650 -128.1474 Random effects: Formula: ~rt | subject Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 0.2740019518 (Intr) rt 0.0004756026 -0.762 Residual0.2450787149 Fixed effects: bold ~ rt + age.group + rt:age.group Value Std.Error DF t-value p-value (Intercept) 0.06109373 0.11725208 628 0.521046 0.6025 rt0.00110117 0.00015732 628 6.999501 0. age.group-0.03750787 0.13732793 43 -0.273126 0.7861 rt:age.group -0.00031919 0.00018259 628 -1.748115 0.0809 Correlation: (Intr) rt ag.grp rt -0.818 age.group-0.854 0.698 rt:age.group 0.705 -0.862 -0.805 Standardized Within-Group Residuals: Min Q1Med Q3Max -3.6110596 -0.5982741 -0.0408144 0.5617381 4.8648242 Number of Observations: 675 Number of Groups: 45 --end output #- if the model assumptions hold this means, we don't have a significant age effect but a highly significant task-effect and the interaction is significant on the 0.1 niveau. I am now interested, if one could do the analysis also using aov and the Error() option. e.g. may I do: l - aov(bold ~ rt*age.group + Error(subject/rt),data=patrizia.data1) summary(l) Error: subject DfSum Sq Mean Sq rt 1 0.0022087 0.0022087 Error: subject:rt Df Sum Sq Mean Sq rt 1 40.706 40.706 Error: Within Df Sum Sq Mean Sq F valuePr(F) rt 1 2.422 2.422 10.0508 0.001592 ** age.group 1 8.722 8.722 36.2022 2.929e-09 *** rt:age.group 1 0.277 0.277 1.1494 0.284060 Residuals669 161.187 0.241 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 which looks weird or what would you recommend? thanks a lot Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] array(list(),c(2,5)) gives error in R 1.8.1
Hi In R 1.7 the following worked fine: array(list(),c(2,5)) [,1] [,2] [,3] [,4] [,5] [1,] NULL NULL NULL NULL NULL [2,] NULL NULL NULL NULL NULL now in R 1.8.1 I get the error: Error in rep.int(data, t1) : invalid number of copies in rep In addition: Warning message: NAs introduced by coercion thanks for help, I need this possibility for storing objects (lm results) in an array cheers Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] comparing classification methods: 10-fold cv or leaving-one-out ?
Hi what would you recommend to compare classification methods such as LDA, classification trees (rpart), bagging, SVM, etc: 10-fold cv (as in Ripley p. 346f) or leaving-one-out (as e.g. implemented in LDA)? my data-set is not that huge (roughly 200 entries) many thanks for a hint Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lda() called with data=subset() command
Hi I have a data.frame with a grouping variable having the levels C, mild AD, mod AD, O and S since I want to compute a lda only for the two groups 'C' and 'mod AD' I call lda with data=subset(mydata.pca,GROUP == 'mod AD' | GROUP == 'C') my.lda - lda(GROUP ~ Comp.1 + Comp.2 + Comp.3 + Comp.4+ Comp.5 + Comp.6 + Comp.7 + Comp.8 , data=subset(mydata.pca,GROUP == 'mod AD' | GROUP == 'C'), CV = TRUE) this results in the warning group(s) mild AD O S are empty in: lda.default(x, grouping, ...) of course... my.lda$class now shows [1] C C C C C C C C C [10] C C C C C C C C C [19] C C C mild AD mild AD mild AD mild AD mild AD mild AD [28] mild AD C mild AD mild AD mild AD C C mild AD mild AD [37] mild AD mild AD Levels: C mild AD mod AD O S it seems it just took the second level (mild AD) for the second class, even though the second level was not used for the lda computation (only the first level (C) and the third level (mod AD) what shall I do to resolve this (little) problem? thanks for a hint christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lmList error if NA in variable not used
I try to fit a lmList model. If in the used dataframe a variable (a column) has some NA the lmList gives an error, even this variable is not used in the model. why? or what is my mistake? Error in na.fail.default(data) : missing values in object thanks cheers christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] partial proportional odds model (PPO)
Hi Since the 'equal slope' assumption doesn't hold in my data I cannot use a proportional odds model ('Design' library, together with 'Hmisc'). I would like to try therefore a partial proportional odds model Please, could anybody tell me, where to find the code and how to specify such a model ..or any potential alternatives many thanks for your kind help christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] R^2 analogue in polr() and prerequisites for polr()
Hi (1)In polr(), is there any way to calculate a pseudo analogue to the R^2. Just for use as a purely descriptive statistic of the goodness of fit? (2) And: what are the assumptions which must be fulfilled, so that the results of polr() (t-values, etc.) are valid? How can I test these prerequisites most easily: I have a three-level (ordered factor) response and four metric variables. many thanks Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] R^2 analogue in polr() and prerequisites for polr()
many thanks! I was just asking for a r-square analogue, since the students I will present the results to, might like to know, how the measure of fit in an ordinal regression (e.g. the residual deviance) compare to measures they know (from introductory courses to linear regression) (such as the r-square), means: how much of the variance of the dependent variable can be explained by the variance of the independent variables. thanks and best regards christoph On Tue, 2003-12-09 at 07:26, Prof Brian Ripley wrote: On 8 Dec 2003, Christoph Lehmann wrote: (1)In polr(), is there any way to calculate a pseudo analogue to the R^2. Just for use as a purely descriptive statistic of the goodness of fit? First define the statistic you are interested in! There is an absolute measure of fit, the residual deviance. (2) And: what are the assumptions which must be fulfilled, so that the results of polr() (t-values, etc.) are valid? How can I test these prerequisites most easily: I have a three-level (ordered factor) response and four metric variables. This is discussed with worked examples in the book that MASS supports, so please consult your copy. -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] criterion for variable selection in LDA
Hi Since a stepwise procedure for variable selection (as e.g. in SPSS) for a LDA is not implemented in R and anyway I cannot be sure, that all the required assumptions for e.g. a procedure using a statistic based on wilks' lambda, hold (such as normality and variance homogeneity) I would like to ask you, what you would recommend me: shall I e.g. define a criterion such as the error-rate stemming from a leaving-one-out cross-validation and then write my own procedure of including/removing variables? or what would be the golden standard for such a case (my case is that I have 2 groups (n1=30, n2=15, number of potential variables: 37, no equal variance in the two groups)) many thanks cheers christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] using split.screen() in Sweave
Dear R and sweave users A further problem, which I couldn't resolve, using the manual: In R I use the split.screen command to put e.g. two timecourses one above the other into one plot: split.screen(c(2,1)) screen(1) plot(stick,type='h', col=red,lwd=2) screen(2) plot(deconvolution.amplitude,type='h',col=blue,lwd=2) Is there a similar way, doing this in Sweave? many thanks Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] plot width in Sweave
Hi I didn't find this in the manual: I need to change the width of a plot while I use sweave, so which command/parameters should I insert below, to change the width of a plot \begin{figure}[htbp] \begin{center} echo=TRUE, fig=TRUE= plot(Re(q),ylab =,type=o,col=blue,lwd=1, sub=mystring) @ \caption{Original stick function (stimulus train)} \end{center} \end{figure} many thanks christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] conversion of some lines of matlab code to R
Dear R-users I got some matlab code (just a few lines) for the deconvolution of a signal: function q=Dconvolve(funct) w=squeeze(size(funct)); % gets the length t=0:3:(w(1,1)-1)*3; h=Dehemo(t,1.25,3); r=fft(h); rinv = 1./r; q = real(ifft( fft(funct(:)).*rinv(:))); %plot(t/0.75,q/max(q),'r-',t/0.75,funct/max(funct),'g-') function h=DeHemo(t,tau,N) h=(t/tau).^(N-1).*exp(-t/tau)./(tau*factorial(N-1)); since I don't know matlab: is there any one who could tell me, how these lines would look like in R, means, how I could do this deconvolution in R? Many thanks Cheers Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] convolution question
Dear R-users I have a question about convolution, using the convolve() command: the following code gives a function, which will be convolved with a train of delta functions. Can anybody tell me, why the convolved function doesn't have the same length as the original train of delta functions? tau - 1.25 N - 3 t - seq(0,15,by=3) h - (t/tau)^(N-1)*exp(-t/tau)/(tau*prod(N-1)) plot(h,type='o') stick - rep(0,100) ones - round(runif(66)*99)+1 stick[ones] -1 stick X11() #open another graphics device plot(stick) convolution - convolve(stick,h,conj=FALSE,type='filter') X11() #open another graphics device plot(convolution,type='o') many thanks! -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] using identify() together with plot () and pixmap()
Dear R users I have a two-dimensional array, whose values I want to plot, using the pixmapGrey class. Plotting works fine, and now I would like to be able to identify some of the points in the plot using identify(). But I get the following message while pressing the left mouse button: plot(pixmapGrey(fmri.vtc[,,slice,volume])) identify(fmri.vtc[,,slice,volume]) warning: no point with 0.25 inches pressing the right mouse button I get: numeric(0) what is the problem here and how can I solve it? many thanks for your help Cheers! Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] overlay two pixmap
when I try to overlay a completely transparent pixmap on another pixmap, I get an error. For reproduction: just the transparent pixmap itself gives an error: tmp - array(0,c(x.dim,y.dim)) tmp - pixmapIndexed(tmp[,]) for (x in 1:x.dim) { for (y in 1:y.dim) { [EMAIL PROTECTED],y] - NA } } plot(tmp) Error: Error in image.default(x = X, y = Y, z = t([EMAIL PROTECTED]([EMAIL PROTECTED]):1, ]), : invalid z limits In addition: Warning messages: 1: no finite arguments to min; returning Inf 2: no finite arguments to max; returning -Inf what happened here? many thanks christoph On Fri, 2003-09-26 at 12:44, Roger Bivand wrote: Christoph Lehmann wrote: I need to overlay two pixmaps (library (pixmap)). One, a pixmapGrey, is the basis, and on this I need to overlay a pixmapIndexed, BUT: the pixmapIndexed has set only some of its pixels to an indexed color, many of its pixels should not cover the basis pixmapGrey pixel, means, for this in pixmapIndexed not defined pixels it should be transparent. What would you recommend me to do? Should I go for another solution than pixmap? Determine which of the indexed colours in the pixmapIndexed object are to be transparent, and change them to NA - you access them in say: library(pixmap) x - read.pnm(system.file(pictures/logo.ppm, package = pixmap)[1]) x plot(x) xx - as(x, pixmapIndexed) xx plot(xx) example(pixmap) z - pixmapRGB(c(z1, z2, z3), 100, 100, bbox = c(0, 0, 100, 100)) plot(z) [EMAIL PROTECTED]:20] [EMAIL PROTECTED] - NA plot(xx, add=T) Here [EMAIL PROTECTED] was white aka #FF, you could use col2rgb() to help set thresholds. Admittedly, this is messy if you don't know your threshold. Roger -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] overlay two pixmap
Hi I need to overlay two pixmaps (library (pixmap)). One, a pixmapGrey, is the basis, and on this I need to overlay a pixmapIndexed, BUT: the pixmapIndexed has set only some of its pixels to an indexed color, many of its pixels should not cover the basis pixmapGrey pixel, means, for this in pixmapIndexed not defined pixels it should be transparent. What would you recommend me to do? Should I go for another solution than pixmap? Many thanks Cheers Christoph -- Christoph Lehmann [EMAIL PROTECTED] University Hospital of Clinical Psychiatry -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Error from gls call (package nlme)
Hi I have a huge array with series of data. For each cell in the array I fit a linear model, either using lm() or gls() with lm() there is no problem, but with gls() I get an error: Error in glsEstimate(glsSt, control = glsEstControl) : computed gls fit is singular, rank 2 as soon as there are data like this: y1 - c(0,0,0,0) x1 - c(0,1,1.3,0) gls(y1~x1) Error in glsEstimate(glsSt, control = glsEstControl) : computed gls fit is singular, rank 2 of course, this is not a problem for lm() lm(y1~x1) Call: lm(formula = y1 ~ x1) Coefficients: (Intercept) x1 00 I know, that such data does not make sense but it is possible, that something like this occurs in my data-set. Since I call gls() for every cell of my array in a loop, I don't want such errors to occur, since this breaks my loop. what is the problem here? What are potential solutions? Many thanks Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] storing objects (lm results) in an array
Hi I have calculated lots (1000) of linear models and would like to store each single result as an element of a 3D matrix or a similar storage: something like glm[i][j][k] = lm(...) Since I read that results are lists: Is it possible to define a matrix of type list? Or what do you recommend? Many thanks Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] anybody running Rggobi on a redhat 9.0 system?
Hi my installation of ggobi (!) was successful, but when I try to install Rggobi as described on http://www.ggobi.org/INSTALL.html: as non-su: R_HOME=/usr/lib/R export R_HOME GGOBI_ROOT=/usr/local/src/ggobi export GGOBI_ROOT R_LIBS=/usr/lib/R/library export R_LIBS as: su ln -s $GGOBI_ROOT/lib/libggobi.so /usr/lib/. ln -s $GGOBI_ROOT/lib/libgtkext.so /usr/lib/. R CMD INSTALL Rggobi_0.53-0.tar.gz I get: ** R ** inst ** save image Error in class-(*tmp*, value = Class) : couldn't find function objWithClass Warning message: package methods in options(defaultPackages) was not found Error in class-(*tmp*, value = Class) : couldn't find function objWithClass Error in library(methods) : .First.lib failed Execution halted /usr/local/lib/R/bin/INSTALL: line 1: 14240 Broken pipe cat ${R_PACKAGE_DIR}/R/${pkg} ERROR: execution of package source for 'Rggobi' failed many thanks for your help Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] PLS LDA
Dear R experts I saw and downloaded the fresh pls package for R. Is there any way of using this pls package for PLS discriminant analysis? If not, is there any other package available. I need a way of classifying objects into e.g. two groups, where nbr_observations nbr_variables many thanks for your kind help Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] logistic regression for a data set with perfect separation
Dear R experts I have the follwoing data V1 V2 1 -5.800 0 2 -4.800 0 3 -2.867 0 4 -0.867 0 5 -0.733 0 6 -1.667 0 7 -0.133 1 8 1.200 1 9 1.333 1 and I want to know, whether V1 can predict V2: of course it can, since there is a perfect separation between cases 1..6 and 7..9 How can I test, whether this conclusion (being able to assign an observation i to class j, only knowing its value on Variable V1) holds also for the population, our data were drawn from? Means, which inference procedure is recommended? Logistic regression, for obvious reasons makes no sense. Many thanks for your help Christoph -- Christoph Lehmann [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help