Re: [R] compiling under windows
On Thu, 30 Jun 2005, Philip Bermingham wrote: What is the best way to set up a project in visual studio, work on R and re compile? Is it better to use a different compiler or programming environment? I specifically want to work on C and Fortran extensions. See the `R Installation and Administration' manual. It is possible to use Visual C++ (there is no Fortran in Visual Studio, although a third-party* extension has been available), but it is easier and more reliable to use the compilers used to build R itself. Information on using VC++ is in the file README.packages, in the top-level directory of a binary installatiion and in R_HOME/src/gnuwin32 in the sources. * Originally from DEC then Compaq and now apparently from Intel. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Nolinear mixed-effects models (nlme)
Hello, I am trying to fit a nonlinear model of the form of: A*x^b*exp(-c*x) This represents a lactation curve. I have a bunch of cows, so I want COW to be a random effect. I have been trying the following code with very littel success: fm1 - nlme(yield ~ A*(DIM^B)*(exp(-C*DIM)), + data = group, + fixed = A + B + C ~ 1, + start = c(A = 20, B = 0.3, C = 0.03)) Does anyone know how to add the random effect of the cow? I have used the command groupedData to have Cow as subject (i.e., yield~DIM | cow). Is this a valid and sufficient approach? I have the feeling it is not sufficient. Also, does anyone know whether the formulation of the fixed effects is correct?. Thank you very much, Alex [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] p-values for classification
Dear All, I'm classifying some data with various methods (binary classification). I'm interpreting the results via a confusion matrix from which I calculate the sensitifity and the fdr. The classifiers are trained on 575 data points and my test set has 50 data points. I'd like to calculate p-values for obtaining =fdr and =sensitifity for each classifier. I was thinking about shuffling/bootstrap the lables of the test set, classify them and calculating the p-value from the obtained normal distributed random fdr and sensitifity. The problem is that it's rather slow when running many rounds of shuffling/classification (I'd like to do this for many classifiers and parameter combinations). In addition classification of the 50 test data points with shuffled lables realistically produces only a very limited number of possible fdr's and sensitivities, and I'm wondering if I can realy believe these values to be normal. Basically I'm looking for a way to calculate the p-values analytically. I'd be happy for any suggestions, web-addresses or references. kind regads, Arne __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] [OT] gmail filter for R-help and R-devel lists
One filter is enough: to: r-help Xiaohua On 6/30/05, Matthew Nelson [EMAIL PROTECTED] wrote: A clarification: this only works properly when the To addresses are in separate filters. Sorry for the confusion. Matt On 6/30/05, Matthew Nelson [EMAIL PROTECTED] wrote: Doug, I was able to accomplish this for r-help by filtering on the To field with the following addresses: r-help@stat.math.ethz.ch, [EMAIL PROTECTED] I created this a week ago and it has so far filtered every mailing list messages successfully. Gmail conversations are a wonderful way to catch up on list activity after periods of neglect. Best regards, Matt On 6/30/05, Douglas Bates [EMAIL PROTECTED] wrote: This is slightly off-topic but I would be interested in whether anyone has succeeded in creating a filter expression for Google's gmail system that will select messages sent through the R-help and R-devel lists. It seems as if it should be easy to select on '[R]' or '[Rd]' in the subject line but I haven't been able to work out the exact syntax that would do this and not select messages that have an 'R' anywhere in the subject. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Xiaohua Dai, Dr. Centre for Systems Research, Durban Institute of Technology P.O.Box 953, Durban 4000, South Africa Tel: +27-31-2042737(O) Fax: +27-31-2042736(O) Mobile: +27-723682954 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] loop over large dataset
Hi All, I'd like to ask for a few clarifications. I am doing some calculations over some biggish datasets. One has ~ 23000 rows, and 6 columns, the other has ~62 rows and 6 columns. I am using these datasets to perform a simulation of of haplotype coalescence over a pedigree (the datestes themselves are pedigree information). I created a new dataset (same number of rows as the pedigree dataset, 2 colums) and I use a looping functions to assign haplotypes according to a standrd biological reprodictive process (i.e. meiosis, sexual reproduction). My code is someting like: off = function(sire, dam){ # simulation of reproduction, two inds sch.toll = round(runif(1, min = 1, max = 2)) dch.toll = round(runif(1, min = 1, max = 2)) s.gam = sire[,sch.toll] d.gam = dam[,dch.toll] offspring = cbind(s.gam,d.gam) # offspring } for (i in 1:dim(new)[1]){ if(ped[i,3] != 0 ped[i,5] != 0){ zz = off(as.matrix(t(new[ped[i,3],])),as.matrix(t(new[ped[i,5],]))) new[i,1] = zz[1] new[i,2] = zz[2] } } I am also attribution a generation index to each row with a trivial calulation: for(i in atres){ genz[i] = (genz[ped[i,3]] + genz[ped[i,5]])/2 + 1 #print(genz[i]) } My question then. On the 23000 rows dataset the calculations take about 5 minutes. On the 62 rows one I kill the process after ~24 hours, and the the job is not finished. Why such immense discrepancy in execution times (the code is the same, the datasets are stored in two separate .RData files)? Any light would be appreciated. Federico PS I am running R 2.1.0 on Debian Sarge, on a Dual 3 GHz Xeon machine with 2 gig RAM. The R process uses 99% of the CPU, but hardly any RAM for what I gather from top. -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] the format of the result
I write a function to get the frequency and prop of a variable. freq-function(x,digits=3) {naa-is.na(x) nas-sum(naa) if (any(naa)) x-x[!naa] n-length(x) ta-table(x) prop-prop.table(ta)*100 res-rbind(ta,prop) rownames(res)-c(Freq,Prop) cat(Missing value(s) are,nas,.\n) cat(Valid case(s) are,n,.\n) cat(Total case(s) are,(n+nas),.\n\n) print(res,digits=(digits+2)) cat(\n) } freq(sample(letters[1:3],48,T),2) Missing value(s) are 0 . Valid case(s) are 48 . Total case(s) are 48 . a b c Freq 11.00 20.00 17.00 Prop 22.92 41.67 35.42 and i want the result to be like a b c Freq 11.00 20.00 17.00 Prop 22.92% 41.67% 35.42% how should i change my function to get what i want? -- Department of Sociology Fudan University,Shanghai Blog:http://sociology.yculblog.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Simple indexing conundrum
My apologies in advance for my thickness but I can't seem to solve the following, seemingly simple, data manipulation problem: I have a data frame that contains multiple factors and multiple continuous response variables, but duplicates of some factor combinations. The duplicates contain bad data, so I would like to eliminate the duplicates. I would like to retain the entire rows identified by the maximum value of one particular continuous response variable. For instance, data(airquality) str(airquality) `data.frame': 153 obs. of 6 variables: $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ... $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ Temp : int 67 72 74 62 56 66 65 59 61 69 ... $ Month : int 5 5 5 5 5 5 5 5 5 5 ... $ Day: int 1 2 3 4 5 6 7 8 9 10 ... I would like to subset airquality, retaining only the rows, containing the maximum Solar.R for each month. Any solution would be greatly appreciated. Regards, Hank Dr. Martin Henry H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/botany/bot/henry.html http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ E Pluribus Unum __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] p-values for classification
Not really an R question. Most classifiers will produce predicted probabilities, and you can check their accuracy. There are lots of details in my PRNN book, and some examples in MASS4. I suggest you adjust your training and test sets to be more nearly equal, or use cross-validation. I don't see how shuffling the labels will help: you want to know how well a classifier does when there is a real relationship between the explanatory variables and the class. To take a simple example, suppose the classes are clearly linearly separable. Then a logistic discriminant will have nigh-perfect performance on the actual data, but very poor performance on permuted labels. You would do a lot better to simulate from a good fitted model, the so-called parametric bootstrapping. On Fri, 1 Jul 2005 [EMAIL PROTECTED] wrote: Dear All, I'm classifying some data with various methods (binary classification). I'm interpreting the results via a confusion matrix from which I calculate the sensitifity and the fdr. The classifiers are trained on 575 data points and my test set has 50 data points. I'd like to calculate p-values for obtaining =fdr and =sensitifity for each classifier. I was thinking about shuffling/bootstrap the lables of the test set, classify them and calculating the p-value from the obtained normal distributed random fdr and sensitifity. The problem is that it's rather slow when running many rounds of shuffling/classification (I'd like to do this for many classifiers and parameter combinations). In addition classification of the 50 test data points with shuffled lables realistically produces only a very limited number of possible fdr's and sensitivities, and I'm wondering if I can realy believe these values to be normal. Basically I'm looking for a way to calculate the p-values analytically. I'd be happy for any suggestions, web-addresses or references. kind regads, Arne __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R integration with Microsoft Powerpoint
Please allow me an unusual question. Is there any way that R can be closely integrated with a Microsoft Powerpoint presentation? I would like to embed R calculations in Powerpoint so that I will start Powerpoint, be prompted to enter some parameters, and an R function will run and return values and graphs. Thanks, John R 2.1.1 windows 2k John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC and University of Maryland School of Medicine Claude Pepper OAIC University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 410-605-7119 - NOTE NEW EMAIL ADDRESS: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] [R-pkgs] New CRAN package relax: R Editor for Literate Analysis and lateX
Now package relax is on CRAN. The name relax is short for R Editor for Literate Analysis and lateX The main element of package relax is the function relax() which starts an all-in-one editor for data analysis and easy creation of LaTeX based documents with R. After calling relax() it creates a tcl/tk widget with a report field. The report field enables you to enter R expressions as well as pieces of text to document your ideas. Computations and plots can be included quickly. After finishing your work the sequence of text chunks, code chunks and integrated graphics and/or R-output will constitute the basis of your work. To achieve a higher quality relax integrates LaTeX compilation for professional formatting and pretty printing. Dependencies: * R (= 2.1.0), tcltk * relax runs on windows systems, LaTeX / ghostscript has to be installed * on Linux systems you have to install the img-package for tcltk For further info see: http://www.wiwi.uni-bielefeld.de/~wolf/software/relax/relax.html maintainer: Hans Peter Wolf Department of Economics University of Bielefeld [EMAIL PROTECTED] R Editor for Literate Analysis and lateX --- the all-in-one editor for data analysis and easy creation of LaTeX based documents with R __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] barplot legend
Hi, Is it possible ti put the legend out of a barplot? tanks Sabine - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] loop over large dataset
My suggestion is that you try to vectorize the computation as much as you can. From what you've shown, `new' and `ped' need to have the same number of rows, right? Your `off' function seems to be randomly choosing between columns 1 and 2 from its two input matrices (one row each?). You may want to do the sampling all at once instead of looping over the rows. E.g., (m - matrix(1:10, ncol=2)) [,1] [,2] [1,]16 [2,]27 [3,]38 [4,]49 [5,]5 10 (colSample - sample(1:2, nrow(m), replace=TRUE)) [1] 1 1 2 1 1 (x - m[cbind(1:nrow(m), colSample)]) [1] 1 2 8 4 5 So you might want to do something like (obviously untested): todo - ped[,3] * ped[,5] != 0 ## indicator of which rows to work on n.todo - sum(todo) ## how many are there? sire - new[ped[todo, 3], ] dam - new[ped[todo, 5], ] s.gam - sire[1:nrow(sire), sample(1:2, nrow(sire), replace=TRUE)] d.gam - dam[1:nrow(dam), sample(1:2, nrow(dam), replace=TRUE)] new[todo, 1:2] - cbind(s.gam, d.gam) Andy From: Federico Calboli Hi All, I'd like to ask for a few clarifications. I am doing some calculations over some biggish datasets. One has ~ 23000 rows, and 6 columns, the other has ~62 rows and 6 columns. I am using these datasets to perform a simulation of of haplotype coalescence over a pedigree (the datestes themselves are pedigree information). I created a new dataset (same number of rows as the pedigree dataset, 2 colums) and I use a looping functions to assign haplotypes according to a standrd biological reprodictive process (i.e. meiosis, sexual reproduction). My code is someting like: off = function(sire, dam){ # simulation of reproduction, two inds sch.toll = round(runif(1, min = 1, max = 2)) dch.toll = round(runif(1, min = 1, max = 2)) s.gam = sire[,sch.toll] d.gam = dam[,dch.toll] offspring = cbind(s.gam,d.gam) # offspring } for (i in 1:dim(new)[1]){ if(ped[i,3] != 0 ped[i,5] != 0){ zz = off(as.matrix(t(new[ped[i,3],])),as.matrix(t(new[ped[i,5],]))) new[i,1] = zz[1] new[i,2] = zz[2] } } I am also attribution a generation index to each row with a trivial calulation: for(i in atres){ genz[i] = (genz[ped[i,3]] + genz[ped[i,5]])/2 + 1 #print(genz[i]) } My question then. On the 23000 rows dataset the calculations take about 5 minutes. On the 62 rows one I kill the process after ~24 hours, and the the job is not finished. Why such immense discrepancy in execution times (the code is the same, the datasets are stored in two separate .RData files)? Any light would be appreciated. Federico PS I am running R 2.1.0 on Debian Sarge, on a Dual 3 GHz Xeon machine with 2 gig RAM. The R process uses 99% of the CPU, but hardly any RAM for what I gather from top. -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] It is time to say thank you.
I would like to express my thanks to the many people who got together and developed the R project. The idea, and work, of organizing and, for no compensation, supporting an open software project must have been (and still be) daunting. It is clear that the availability of a free, high-quality programing environment for programming and statistical analysis has allowed people around the world to perform analyses that they previously could not do. The continued time and effort that the R community gives to support that R is greatly appreciated. Many thanks to the organizers, developers, and supporters of the R project! John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC and University of Maryland School of Medicine Claude Pepper OAIC University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 410-605-7119 - NOTE NEW EMAIL ADDRESS: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Simple indexing conundrum
Is this close to what you want? air.sub - do.call(rbind, lapply(split(airquality, airquality$Month), +function(d) d[which.max(d$Solar.R),])) air.sub Ozone Solar.R Wind Temp Month Day 514 334 11.5 64 5 16 6NA 332 13.8 80 6 14 740 314 10.9 83 7 6 828 273 11.5 82 8 13 924 259 9.7 73 9 10 Andy From: Martin Henry H. Stevens My apologies in advance for my thickness but I can't seem to solve the following, seemingly simple, data manipulation problem: I have a data frame that contains multiple factors and multiple continuous response variables, but duplicates of some factor combinations. The duplicates contain bad data, so I would like to eliminate the duplicates. I would like to retain the entire rows identified by the maximum value of one particular continuous response variable. For instance, data(airquality) str(airquality) `data.frame': 153 obs. of 6 variables: $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ... $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ Temp : int 67 72 74 62 56 66 65 59 61 69 ... $ Month : int 5 5 5 5 5 5 5 5 5 5 ... $ Day: int 1 2 3 4 5 6 7 8 9 10 ... I would like to subset airquality, retaining only the rows, containing the maximum Solar.R for each month. Any solution would be greatly appreciated. Regards, Hank Dr. Martin Henry H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/botany/bot/henry.html http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ E Pluribus Unum __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] barplot legend
On Fri, 2005-07-01 at 14:04 +0200, Navarre Sabine wrote: Hi, Is it possible ti put the legend out of a barplot? tanks Sabine I presume that you mean outside the plot region? If so, you can use something like the following: # Adjust the plot margins to make room for the # legend on the right side. See ?par par(mar = c(5, 4, 4, 10) + 0.1) barplot(1:10) box() # Set xpd to allow legend placement outside # plot region. See ?par par(xpd = TRUE) # Left click on the right side of the window where you want # the legend. See ?locator l - locator(1) # Now put the legend where you clicked # See ?legend legend(l$x, l$y, legend = Legend Here) HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Simple indexing conundrum
Here is a different approach I only send since the result is slightly different in that two rows are returned for Month 9 and the original row number is retained. max2-function(x){max(x,na.rm=T)} MonthMax-ave(Solar.R,Month,FUN=max2) new-subset(airquality,Solar.R==MonthMax) new-subset(airquality,Solar.R==MonthMax) new Ozone Solar.R Wind Temp Month Day 16 14 334 11.5 64 5 16 45 NA 332 13.8 80 6 14 67 40 314 10.9 83 7 6 10528 273 11.5 82 8 13 13324 259 9.7 73 9 10 13521 259 15.5 76 9 12 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy Sent: July 1, 2005 8:31 AM To: 'Martin Henry H. Stevens'; R-Help Subject: Re: [R] Simple indexing conundrum Is this close to what you want? air.sub - do.call(rbind, lapply(split(airquality, airquality$Month), +function(d) d[which.max(d$Solar.R),])) air.sub Ozone Solar.R Wind Temp Month Day 514 334 11.5 64 5 16 6NA 332 13.8 80 6 14 740 314 10.9 83 7 6 828 273 11.5 82 8 13 924 259 9.7 73 9 10 Andy From: Martin Henry H. Stevens My apologies in advance for my thickness but I can't seem to solve the following, seemingly simple, data manipulation problem: I have a data frame that contains multiple factors and multiple continuous response variables, but duplicates of some factor combinations. The duplicates contain bad data, so I would like to eliminate the duplicates. I would like to retain the entire rows identified by the maximum value of one particular continuous response variable. For instance, data(airquality) str(airquality) `data.frame': 153 obs. of 6 variables: $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ... $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ Temp : int 67 72 74 62 56 66 65 59 61 69 ... $ Month : int 5 5 5 5 5 5 5 5 5 5 ... $ Day: int 1 2 3 4 5 6 7 8 9 10 ... I would like to subset airquality, retaining only the rows, containing the maximum Solar.R for each month. Any solution would be greatly appreciated. Regards, Hank Dr. Martin Henry H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/botany/bot/henry.html http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ E Pluribus Unum __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] 10^k axis labels {was .. (log scale on y-axis)}
Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Thu, 30 Jun 2005 07:28:30 -0400 writes: Gabor On 6/29/05, Jing Shen [EMAIL PROTECTED] wrote: I am planning to plot my data on log scale (y-axis). There is a parameter in plot function, which is plot( ..., log=y, ...) While, the problem is that it is with base of e. Is there a way to let me change it to 10 instead of e? Gabor Is your question how to get the axis labels to be powers of 10? Gabor In that case, Gabor plot(1:100, log = y, yaxt = n) # do not show y axis Gabor axis(2, c(1,10,100)) # draw y axis with required labels and if you're there, you might be interested in the following which provides a somewhat automated way to show a * 10 ^ k tick-labels instead of the scientific a e k ones. { For some time, I had wanted that something like this could become an easy option for builtin axis(*), but then I also know that we should rather strive to build future-proof tools, which hence should we applicable to 'grid' as well as to old-style 'graphics' and all this got me stuck in the process ... } Martin Maechler, ETH Zurich --- axTexpr - function(side, at = axTicks(side, axp=axp, usr=usr, log=log), axp = NULL, usr = NULL, log = NULL) { ## Purpose: Do a 10^k labeling instead of a ek ##this auxiliary should return 'at' and 'label' (expression) ## -- ## Arguments: as for axTicks() ## -- ## Author: Martin Maechler, Date: 7 May 2004, 18:01 eT - floor(log10(abs(at)))# at == 0 case is dealt with below mT - at / 10^eT ss - lapply(seq(along = at), function(i) if(at[i] == 0) quote(0) else substitute(A %*% 10^E, list(A=mT[i], E=eT[i]))) do.call(expression, ss) } par(mar=.1+c(5,5,4,1))## For the horizontal y-axis labels, need more space plot(x,y, axes= FALSE, frame=TRUE) aX - axTicks(1); axis(1, at=aX, label= axTexpr(1, aX)) if(FALSE) # rather the next one { aY - axTicks(2); axis(2, at=aY, label= axTexpr(2, aY))} ## or rather (horizontal labels on y-axis): aY - axTicks(2); axis(2, at=aY, label= axTexpr(2, aY), las=2) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] FW: plot legend outside the grid
In principle you are there, just after opening the device set par(mar) appropriate (large margin to place the legend in) before starting with plotting. In the legend, specify points which virtually are in the margin you have already expanded... Uwe Ligges Ghosh, Sandeep wrote: -Original Message- From: Ghosh, Sandeep Sent: Thursday, June 30, 2005 5:43 PM To: 'Berton Gunter' Subject: plot legend outside the grid Thanks for the pointers... I managed to get everything to look and feel the way I want except for the legend to plot outside the grid... Thanks for the note on the par, but I'm not able to it to plot outside the plot grid.. dataFrame - as.data.frame(t(structure(c( 64,'wt', 62,'wt', 66,'wt', [SNIP] 63,'hom', 64,'hom', 67,'hom'), .Dim=c(2,98; colnames(dataFrame) - c('marbles_buried', 'genotype'); png('mb.png', width=400, height=400, pointsize=8); dataFrame[c(marbles_buried)] - lapply(dataFrame[c(marbles_buried)], function(x) as.numeric(levels(x)[x])); par(xpd=FALSE) with (dataFrame, stripchart(marbles_buried ~ genotype, method=jitter, vertical=TRUE, col = c('blue', 'red', 'green'), xlab='Genotype', ylab = Marbles Buried, main='MBA WTs Vs HOMs', pch=c(1,4,2), jitter=1/3.5, cex=1)) meds - as.vector(with(dataFrame, by(marbles_buried, genotype, mean))) segments((1:3)-0.25, meds, (1:3)+0.25, meds, col = c('blue', 'red', 'green')); dataWt - subset(dataFrame, genotype=='wt', select=c(marbles_buried,genotype)); dataHet - subset(dataFrame, genotype=='het', select=c(marbles_buried,genotype)); dataHom - subset(dataFrame, genotype=='hom', select=c(marbles_buried,genotype)); wtCount - length(dataWt$marbles_buried); hetCount - length(dataHet$marbles_buried); homCount - length(dataHom$marbles_buried); wtLegend - paste (wt, (n=, wtCount, )); hetLegend - paste (het, (n=, hetCount, )); homLegend - paste (hom, (n=, homCount, )); par(xpd=TRUE) legend(1, max(as.vector(dataFrame$marbles_buried)), c(wtLegend, hetLegend, homLegend), col=c('blue', 'red', 'green'), pch=c(1,4,2)); -Thanks Sandeep. -Original Message- From: Berton Gunter [mailto:[EMAIL PROTECTED] Sent: Thursday, June 30, 2005 2:55 PM To: Ghosh, Sandeep Subject: RE: [R] Help with stripplot Of course! stripchart() is a base graphics function and tehrefore has available to it the base graphics functionality, like (the base graphics function, **not** the lattice argument) legend(). See ?legend in the graphics package. Note the use of locator() for positioning the legend. Note: By default the legend will be clipped to the plot region. If you wish to have a legend outside the plot region set the xpd parameter of par to TRUE or NA prior to plotting. -- Bert -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ghosh, Sandeep Sent: Thursday, June 30, 2005 12:22 PM To: Deepayan Sarkar; r-help@stat.math.ethz.ch Subject: Re: [R] Help with stripplot Another question, in stripchart is there a way to draw a legends. I need legends that gives the mice count for each genotype wt/het/hom, something like the xyplot plot support for key/auto.key. -Sandeep __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Simple indexing conundrum
Thank guys! Both solutions do what I need. Thanks. Hank On Jul 1, 2005, at 8:45 AM, Jim Brennan wrote: Here is a different approach I only send since the result is slightly different in that two rows are returned for Month 9 and the original row number is retained. max2-function(x){max(x,na.rm=T)} MonthMax-ave(Solar.R,Month,FUN=max2) new-subset(airquality,Solar.R==MonthMax) new-subset(airquality,Solar.R==MonthMax) new Ozone Solar.R Wind Temp Month Day 16 14 334 11.5 64 5 16 45 NA 332 13.8 80 6 14 67 40 314 10.9 83 7 6 10528 273 11.5 82 8 13 13324 259 9.7 73 9 10 13521 259 15.5 76 9 12 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy Sent: July 1, 2005 8:31 AM To: 'Martin Henry H. Stevens'; R-Help Subject: Re: [R] Simple indexing conundrum Is this close to what you want? air.sub - do.call(rbind, lapply(split(airquality, airquality$Month), +function(d) d[which.max(d$Solar.R),])) air.sub Ozone Solar.R Wind Temp Month Day 514 334 11.5 64 5 16 6NA 332 13.8 80 6 14 740 314 10.9 83 7 6 828 273 11.5 82 8 13 924 259 9.7 73 9 10 Andy From: Martin Henry H. Stevens My apologies in advance for my thickness but I can't seem to solve the following, seemingly simple, data manipulation problem: I have a data frame that contains multiple factors and multiple continuous response variables, but duplicates of some factor combinations. The duplicates contain bad data, so I would like to eliminate the duplicates. I would like to retain the entire rows identified by the maximum value of one particular continuous response variable. For instance, data(airquality) str(airquality) `data.frame':153 obs. of 6 variables: $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ... $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... $ Temp : int 67 72 74 62 56 66 65 59 61 69 ... $ Month : int 5 5 5 5 5 5 5 5 5 5 ... $ Day: int 1 2 3 4 5 6 7 8 9 10 ... I would like to subset airquality, retaining only the rows, containing the maximum Solar.R for each month. Any solution would be greatly appreciated. Regards, Hank Dr. Martin Henry H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/botany/bot/henry.html http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ E Pluribus Unum __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Dr. Martin Henry H. Stevens, Assistant Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/botany/bot/henry.html http://www.muohio.edu/ecology/ http://www.muohio.edu/botany/ E Pluribus Unum __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] 10^k axis labels {was .. (log scale on y-axis)}
On 7/1/05, Martin Maechler [EMAIL PROTECTED] wrote: Gabor == Gabor Grothendieck [EMAIL PROTECTED] on Thu, 30 Jun 2005 07:28:30 -0400 writes: Gabor On 6/29/05, Jing Shen [EMAIL PROTECTED] wrote: I am planning to plot my data on log scale (y-axis). There is a parameter in plot function, which is plot( ..., log=y, ...) While, the problem is that it is with base of e. Is there a way to let me change it to 10 instead of e? Gabor Is your question how to get the axis labels to be powers of 10? Gabor In that case, Gabor plot(1:100, log = y, yaxt = n) # do not show y axis Gabor axis(2, c(1,10,100)) # draw y axis with required labels and if you're there, you might be interested in the following which provides a somewhat automated way to show a * 10 ^ k tick-labels instead of the scientific a e k ones. { For some time, I had wanted that something like this could become an easy option for builtin axis(*), but then I also know that we should rather strive to build future-proof tools, which hence should we applicable to 'grid' as well as to old-style 'graphics' and all this got me stuck in the process ... } Martin Maechler, ETH Zurich --- axTexpr - function(side, at = axTicks(side, axp=axp, usr=usr, log=log), axp = NULL, usr = NULL, log = NULL) { ## Purpose: Do a 10^k labeling instead of a ek ##this auxiliary should return 'at' and 'label' (expression) ## -- ## Arguments: as for axTicks() ## -- ## Author: Martin Maechler, Date: 7 May 2004, 18:01 eT - floor(log10(abs(at)))# at == 0 case is dealt with below mT - at / 10^eT ss - lapply(seq(along = at), function(i) if(at[i] == 0) quote(0) else substitute(A %*% 10^E, list(A=mT[i], E=eT[i]))) do.call(expression, ss) } par(mar=.1+c(5,5,4,1))## For the horizontal y-axis labels, need more space plot(x,y, axes= FALSE, frame=TRUE) aX - axTicks(1); axis(1, at=aX, label= axTexpr(1, aX)) if(FALSE) # rather the next one { aY - axTicks(2); axis(2, at=aY, label= axTexpr(2, aY))} ## or rather (horizontal labels on y-axis): aY - axTicks(2); axis(2, at=aY, label= axTexpr(2, aY), las=2) This may not be as good as what you have (although its arguably prettier in the specific example below) and may suffice in many, though possibly not all, cases -- I mention it since its very simple and, in fact, requires no auxilliary routines. It uses your idea of employing axTicks. The key trick is to use axTicks twice in axis: x - 10 ^ seq(-2,10) # test data plot(x, log = y, yaxt = n) axis(2, axTicks(2), axTicks(2)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Reconstructing LD function
On Mon, 2005-06-27 at 13:18, Prof Brian Ripley wrote: On Mon, 27 Jun 2005 [EMAIL PROTECTED] wrote: in an LDA analysis with n groups n-1 LD functions result. Implicitly this defines an LD fucntion for the last group. Does there exist code already to explictly construct this LD function? Thank you for the quick reply. What `LDA analysis' are our discussing here? (LDA is usually `linear discriminant analysis', so what did you mean and what R function are you nor referring to?) R has lda in package MASS, and that works with n LD functions. To reduce it to n-1, subtract the last one from the others, in which case LD_n == 0. Indeed I have been using the MASS::lda package. Anything you do in LD analysis only depends on differences in LD functions, and there really are n of them. With two groups one is conventionally taken to be zero (the first, usually, not the last). How is the classifcation decision reached from the LD functions? Are those what is known as linear Fisherian discriminant functions? If so, I'm not positive about why one of these functions can be set to 0. Thank you in advance for the clarification. Best wishes, Stefan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Reconstructing LD function
On Fri, 1 Jul 2005 [EMAIL PROTECTED] wrote: On Mon, 2005-06-27 at 13:18, Prof Brian Ripley wrote: On Mon, 27 Jun 2005 [EMAIL PROTECTED] wrote: in an LDA analysis with n groups n-1 LD functions result. Implicitly this defines an LD fucntion for the last group. Does there exist code already to explictly construct this LD function? Thank you for the quick reply. What `LDA analysis' are our discussing here? (LDA is usually `linear discriminant analysis', so what did you mean and what R function are you nor referring to?) R has lda in package MASS, and that works with n LD functions. To reduce it to n-1, subtract the last one from the others, in which case LD_n == 0. Indeed I have been using the MASS::lda package. Anything you do in LD analysis only depends on differences in LD functions, and there really are n of them. With two groups one is conventionally taken to be zero (the first, usually, not the last). How is the classifcation decision reached from the LD functions? Are those what is known as linear Fisherian discriminant functions? If so, I'm not positive about why one of these functions can be set to 0. What I did say: `Anything you do in LD analysis only depends on differences in LD functions' So subtracting any one function from the others does not change the differences. LD is not about classification, and Fisher did not do classification, nor did he use more than 2 classes. I suspect your difficulty is going to be clearing your preconceptions. lda() is support software for a book which does explain the relationship between Fisher's discrimination and classification: please consult it for the background. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Nolinear mixed-effects models (nlme)
On 6/30/05, Alex Bach [EMAIL PROTECTED] wrote: Hello, I am trying to fit a nonlinear model of the form of: A*x^b*exp(-c*x) This represents a lactation curve. I have a bunch of cows, so I want COW to be a random effect. You need to decide which of the model parameters (i.e. A, B and C) should have a random effect grouped by COW and to specify this in your call to nlme. I have been trying the following code with very littel success: fm1 - nlme(yield ~ A*(DIM^B)*(exp(-C*DIM)), + data = group, + fixed = A + B + C ~ 1, + start = c(A = 20, B = 0.3, C = 0.03)) Does anyone know how to add the random effect of the cow? I have used the command groupedData to have Cow as subject (i.e., yield~DIM | cow). Is this a valid and sufficient approach? I have the feeling it is not sufficient. Also, does anyone know whether the formulation of the fixed effects is correct?. Thank you very much, Alex [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] plot svm
Hello I'm working with DNA microarrays and want to classify them with SVM. I want to plot the results and it's imposible for me. I found others tutorials and examples (with iris and cats data) where you can plot the results with plot.svm, but you need to write a formula and I don't know how to do this with golubEsets data, for example . plot ( svm1, golubTrain, formula) For example, Iris Data: Sepal.Length Sepal.Width Petal.Length Petal.WidthSpecies 15.1 3.51.4 0.2setosa 24.9 3.01.4 0.2setosa 34.7 3.21.3 0.2setosa 44.6 3.11.5 0.2setosa 55.0 3.61.4 0.2setosa 65.4 3.91.7 0.4setosa 74.6 3.41.4 0.3setosa 85.0 3.41.5 0.2setosa m2 - svm(Species~., data = iris) plot(m2, iris, Petal.Width ~ Petal.Length, slice = list(Sepal.Width = 3, Sepal.Length = 4)) I should be grateful if you would send me information about how to plot the golubEsets data (for example the formula, because I have tested several options but neither of them work). My data are very similar (expression values with several conditions), so I could plot my results if I knew how to plot golub data. Thanks a lot Beatriz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R integration with Microsoft Powerpoint
On Fri, 1 Jul 2005, John Sorkin wrote: Please allow me an unusual question. Is there any way that R can be closely integrated with a Microsoft Powerpoint presentation? I would like to embed R calculations in Powerpoint so that I will start Powerpoint, be prompted to enter some parameters, and an R function will run and return values and graphs. R can be driven by COM, so if Powerpoint supports COM (possibly via VBA) this would be possible. It is likely, as other MS Office applications do. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] [OT] gmail filter for R-help and R-devel lists
i dont use gmail, but this method *may* run into problems if people are replying to a message and r-help is on the cc: line. thunderbird has a to: or cc: option for this... is gmail's to: field a default for to: or cc: ? -s. Deepayan Sarkar wrote: On 6/30/05, Douglas Bates [EMAIL PROTECTED] wrote: This is slightly off-topic but I would be interested in whether anyone has succeeded in creating a filter expression for Google's gmail system that will select messages sent through the R-help and R-devel lists. It seems as if it should be easy to select on '[R]' or '[Rd]' in the subject line but I haven't been able to work out the exact syntax that would do this and not select messages that have an 'R' anywhere in the subject. I filter on the To field, which mostly works: Matches: to:(r-help@stat.math.ethz.ch) Do this: Skip Inbox, Apply label r-help Matches: to:([EMAIL PROTECTED]) Do this: Skip Inbox, Apply label r-help Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] the format of the result
See ?sprintf #e.g. Replace your prop line with: prop-sprintf(%.2f%%, prop.table(ta)*100) _ mailto:[EMAIL PROTECTED] Kenneth Ray Hobson, P.E. Oklahoma DOT - QA IAS Manager 200 N.E. 21st Street Oklahoma City, OK 73105-3204 (405) 522-4985, (405) 522-0552 fax Visit our website at: http://www.okladot.state.ok.us/materials/materials.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] [OT] gmail filter for R-help and R-devel lists
Yes, Douglas has proved this. On 7/1/05, Suresh Krishna [EMAIL PROTECTED] wrote: i dont use gmail, but this method *may* run into problems if people are replying to a message and r-help is on the cc: line. thunderbird has a to: or cc: option for this... is gmail's to: field a default for to: or cc: ? -s. Deepayan Sarkar wrote: On 6/30/05, Douglas Bates [EMAIL PROTECTED] wrote: This is slightly off-topic but I would be interested in whether anyone has succeeded in creating a filter expression for Google's gmail system that will select messages sent through the R-help and R-devel lists. It seems as if it should be easy to select on '[R]' or '[Rd]' in the subject line but I haven't been able to work out the exact syntax that would do this and not select messages that have an 'R' anywhere in the subject. I filter on the To field, which mostly works: Matches: to:(r-help@stat.math.ethz.ch) Do this: Skip Inbox, Apply label r-help Matches: to:([EMAIL PROTECTED]) Do this: Skip Inbox, Apply label r-help Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R integration with Microsoft Powerpoint
Sure. Just run R in a BAT file. You just reference the BAT file in PowerPoint like any other EXE application via an OLE link. Of course you can always use VBA code in Powerpoint to Shell() to the BAT program. In R, type ?BATCH to see how the BAT file's content line should be coded to run the R program. mailto:[EMAIL PROTECTED] Kenneth Ray Hobson, P.E. Oklahoma DOT - QA IAS Manager 200 N.E. 21st Street Oklahoma City, OK 73105-3204 (405) 522-4985, (405) 522-0552 fax Visit our website at: http://www.okladot.state.ok.us/materials/materials.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ranking predictive features in logsitic regression
On 30 Jun, 2005, at 21:20, Stephen Choularton wrote: Hi Is there some function R that multiplies each coefficient by the standard deviation of the corresponding variable and produces a ranking? Possibly you meant un-signed coefficients? In which case something like function(model) rank(abs(coef(model)) * apply(model.matrix(model), 2, sd)) should do what you asked about. The relimp package provides approximate inference for comparisons of this kind. I should say that I don't think that such a ranking will often be very useful, though. Some coefficients will be determined with greater precision than others, and there may be correlations to worry about, or variables may only make sense when considered in groups (eg factor effects, or interactions with corresponding main effects, etc.) David __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] OT: How to instaill gcc in cygwin?
Dear Listers, I know it is far off topic. But I do know there must be some people here who know it very well. Sorry for bothering others. Thanks. -- WenSui Liu, MS MA Senior Decision Support Analyst Division of Health Policy and Clinical Effectiveness Cincinnati Children Hospital Medical Center __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Calculate 3D Fixed Kernel Home Range
I have x,y data on three animals (~150 data points each). I have calculated the fixed kernel home range using the 'adehabitat' library and the LSCV smoothing factor. Can anyone provide me with some help on how to display the density estimate of the Utilization Distribution 3-dimensionally? Thanks in advance, Jared __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] the format of the result
On Fri, 2005-07-01 at 19:40 +0800, ronggui wrote: I write a function to get the frequency and prop of a variable. freq-function(x,digits=3) {naa-is.na(x) nas-sum(naa) if (any(naa)) x-x[!naa] n-length(x) ta-table(x) prop-prop.table(ta)*100 res-rbind(ta,prop) rownames(res)-c(Freq,Prop) cat(Missing value(s) are,nas,.\n) cat(Valid case(s) are,n,.\n) cat(Total case(s) are,(n+nas),.\n\n) print(res,digits=(digits+2)) cat(\n) } freq(sample(letters[1:3],48,T),2) Missing value(s) are 0 . Valid case(s) are 48 . Total case(s) are 48 . a b c Freq 11.00 20.00 17.00 Prop 22.92 41.67 35.42 and i want the result to be like a b c Freq 11.00 20.00 17.00 Prop 22.92% 41.67% 35.42% how should i change my function to get what i want? Here is a modification of the function that I think should work. Note that part of the output formatting process has to take into account the a priori unknowns involving your 'digits' argument, the lengths of the dimnames resulting from the table and the lengths of the frequency counts in the table. Thus, a fair amount of the code is establishing the 'width' argument, which is then used in formatC() so that the output can be column aligned properly. Note that by default, table() will exclude NA, so you do not need to subset 'x' before using table(). Also, note that I change Prop to Pct. freq - function(x, digits = 3) { n - length(x) missing - sum(is.na(x)) ta - table(x) pct - prop.table(ta) * 100 width - max(nchar(unlist(dimnames(ta))) + 1, nchar(ta) + digits + 1, 5 + digits) Vals - paste(formatC(unlist(dimnames(ta)), format = s, width = width), collapse = ) Freq - paste(formatC(ta, format = f, digits = digits, width = width), collapse = ) Pct - paste(formatC(pct, format = f, digits = digits, width = width), %, sep = , collapse = ) cat(Missing value(s) are, missing, .\n) cat(Valid case(s) are, n - missing,.\n) cat(Total case(s) are, n, .\n\n) cat(, Vals, \n) cat(Freq, Freq, \n) cat(Pct , Pct, \n) cat(\n) } Thus: freq(sample(letters[1:3], 48, TRUE), 2) Missing value(s) are 0 . Valid case(s) are 48 . Total case(s) are 48 . abc Freq 28.00 8.0012.00 Pct58.33% 16.67% 25.00% freq(sample(c(letters[1:3], NA), 1000, TRUE), 2) Missing value(s) are 257 . Valid case(s) are 743 . Total case(s) are 1000 . abc Freq 250.00 218.00 275.00 Pct33.65% 29.34% 37.01% freq(iris$Species) Missing value(s) are 0 . Valid case(s) are 150 . Total case(s) are 150 . setosa versicolorvirginica Freq 50.000 50.000 50.000 Pct 33.333% 33.333% 33.333% freq(iris$Species, 0) Missing value(s) are 0 . Valid case(s) are 150 . Total case(s) are 150 . setosa versicolorvirginica Freq 50 50 50 Pct 33% 33% 33% HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Lines for plot (Sweave)
Dear List: I am generating a series of plots iteratively using Sweave. In short, a dataframe is subsetted row by row and variable graphics are created conditional on the data in each row. In this particular case, this code ends up generating 17,000 individual plots. In some cases, all student data (this is working with student achievement data) are available and my code below works very well in the sense that a line connects all points. However, in some cases there are missing data and I need to modify my code so that lines are connected through all points even when data are missing. Here is a snip of relevant code. In the actual program, the data in stu.vector and avg.vector are obtained from the dataframe as the programs loops through each row. stu.vector-c(2500, 2510, NA , 2600) avg.vector-c(2635, 2589, 2628, 2685) x - c(0,1,2,3) graph.min - min(stu.vector,avg.vector ,na.rm=TRUE)-150 graph.max - max(stu.vector,avg.vector ,na.rm=TRUE)+150 plot(x, stu.vector, ylim=c(graph.min,graph.max), xlab= , ylab=Scaled Score, xaxt='n', pch=2, col='blue', main=Math Growth Rate) points(x, avg.vector, pch=1, col='red') lines(x, stu.vector, lty=1, col='blue') lines(x, avg.vector, lty=3, col='red') If the NA did not exist in the object stu.vector then all points would be connected with lines. However, in some cases data are missing and I need to connect the data in stu.vector with lines. So in this case, the line would connect points 1 and 2, and then 2 and 4 even though point 3 is missing. Can anyone suggest how I might do this? Thanks, Harold [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Lines for plot (Sweave)
On 7/1/05, Doran, Harold [EMAIL PROTECTED] wrote: Dear List: I am generating a series of plots iteratively using Sweave. In short, a dataframe is subsetted row by row and variable graphics are created conditional on the data in each row. In this particular case, this code ends up generating 17,000 individual plots. In some cases, all student data (this is working with student achievement data) are available and my code below works very well in the sense that a line connects all points. However, in some cases there are missing data and I need to modify my code so that lines are connected through all points even when data are missing. Here is a snip of relevant code. In the actual program, the data in stu.vector and avg.vector are obtained from the dataframe as the programs loops through each row. stu.vector-c(2500, 2510, NA , 2600) avg.vector-c(2635, 2589, 2628, 2685) x - c(0,1,2,3) graph.min - min(stu.vector,avg.vector ,na.rm=TRUE)-150 graph.max - max(stu.vector,avg.vector ,na.rm=TRUE)+150 plot(x, stu.vector, ylim=c(graph.min,graph.max), xlab= , ylab=Scaled Score, xaxt='n', pch=2, col='blue', main=Math Growth Rate) points(x, avg.vector, pch=1, col='red') lines(x, stu.vector, lty=1, col='blue') lines(x, avg.vector, lty=3, col='red') If the NA did not exist in the object stu.vector then all points would be connected with lines. However, in some cases data are missing and I need to connect the data in stu.vector with lines. So in this case, the line would connect points 1 and 2, and then 2 and 4 even though point 3 is missing. Replace the first lines statement with: lines(approx(x, stu.vector), lty=1, col='blue') __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] zlim for levelplot
__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Lines for plot (Sweave)
Fabulous, it works great. I didn't know about approx(). Thank you -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Friday, July 01, 2005 1:54 PM To: Doran, Harold Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Lines for plot (Sweave) On 7/1/05, Doran, Harold [EMAIL PROTECTED] wrote: Dear List: I am generating a series of plots iteratively using Sweave. In short, a dataframe is subsetted row by row and variable graphics are created conditional on the data in each row. In this particular case, this code ends up generating 17,000 individual plots. In some cases, all student data (this is working with student achievement data) are available and my code below works very well in the sense that a line connects all points. However, in some cases there are missing data and I need to modify my code so that lines are connected through all points even when data are missing. Here is a snip of relevant code. In the actual program, the data in stu.vector and avg.vector are obtained from the dataframe as the programs loops through each row. stu.vector-c(2500, 2510, NA , 2600) avg.vector-c(2635, 2589, 2628, 2685) x - c(0,1,2,3) graph.min - min(stu.vector,avg.vector ,na.rm=TRUE)-150 graph.max - max(stu.vector,avg.vector ,na.rm=TRUE)+150 plot(x, stu.vector, ylim=c(graph.min,graph.max), xlab= , ylab=Scaled Score, xaxt='n', pch=2, col='blue', main=Math Growth Rate) points(x, avg.vector, pch=1, col='red') lines(x, stu.vector, lty=1, col='blue') lines(x, avg.vector, lty=3, col='red') If the NA did not exist in the object stu.vector then all points would be connected with lines. However, in some cases data are missing and I need to connect the data in stu.vector with lines. So in this case, the line would connect points 1 and 2, and then 2 and 4 even though point 3 is missing. Replace the first lines statement with: lines(approx(x, stu.vector), lty=1, col='blue') __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R integration with Microsoft Powerpoint
Of course there are many ways to do it. The user input could come from R dialogs via the tcltk package or the Input() dialogs from VBA in Powerpoint. I chose the output as PDF. The R source code called cars.r, might go something like: pdf(file=paste(getwd(), /, cars.pdf, sep=), width = 8.5, height = 11, onefile = TRUE, family = Helvetica, title = R Graphics Output, fonts = NULL, version = 1.1) plot(cars) lines(lowess(cars)) graphics.off() shell(paste(getwd(), /, cars.pdf, sep=),wait=FALSE) #veiw PDF stop(all done) The cars.bat file, might go something like: C:\Program Files\R\rw2010\bin\R.exe CMD BATCH c:\myfiles\r\cars.r #Change the drives and paths to R.exe and the cars.r files. The cars.bat file was played from PowerPoint by creating the object and doubleclicking in the slideshow. The are other ways to do it of course. In PowerPoint, click the menu item Insert | Object | Create and browse to and select the cars.r file. I set the object as an icon and used the R.exe icon. VBA scripting to play the cars.bat file is not all that involved either. ___ mailto:[EMAIL PROTECTED] Kenneth Ray Hobson, P.E. Oklahoma DOT - QA IAS Manager 200 N.E. 21st Street Oklahoma City, OK 73105-3204 (405) 522-4985, (405) 522-0552 fax Visit our website at: http://www.okladot.state.ok.us/materials/materials.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R integration with Microsoft Powerpoint
...snip...In PowerPoint, click the menu item Insert | Object | Create and browse to and select the cars.r file. ...snip... In the previous post snippet above, replace cars.r with cars.bat. To run the cars.bat program via VBA, I would typically insert a button. To do so in PowerPoint, right click the toolbar, select Control Toolbar and then click the button icon. Right click and drag and draw the button onto the slide. Double click the button object and add code something like: Private Sub CommandButton1_Click() Shell (c:\myfiles\r\cars.bat) End Sub When passing input to a program like R, I typically use VBA's Input() and write the results to a TXT file. This is then easily read into R. mailto:[EMAIL PROTECTED] Kenneth Ray Hobson, P.E. Oklahoma DOT - QA IAS Manager 200 N.E. 21st Street Oklahoma City, OK 73105-3204 (405) 522-4985, (405) 522-0552 fax Visit our website at: http://www.okladot.state.ok.us/materials/materials.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] lapply
Hi, all: I have a program here but it runs slow and I am wondering if there is some place I can change to make it run faster. Two lists, scd and c1, like this: scd[1:2] [[1]] [1] 54 241 [[2]] [1] 52 53 ... c1[1:2] [[1]] [1] 13 30 92 93 13 94 30 95 96 97 98 99 [13] 8 19 31 100 101 29 [[2]] [1] 13 55 length(scd) [1] 2542 length(c1) [1] 31859 My target is for each in scd, I need to know how many times it (as the whole) occur in c1. My code is N - length(scd) # num of word_comb M - length(c1) # num of class1 g1 - lapply(1:N, function(i) lapply(1:M, function(j) all(scd[[i]] %in% c1[[j]]))) a - lapply(1:N, function(i) sum(g1[[i]]==T)) My questions: 1. g1's calc is very slow 2. how to do the following using apply: tab - array(as.integer(0), dim=c(2,2,N) for (i in 1:N){ tab[2,1,i] - a[[i]] } tab[2,2,]=M-tab[2,1,] Thanks, -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] OT: How to instaill gcc in cygwin?
Wensui Liu liuwensui at gmail.com writes: I know it is far off topic. But I do know there must be some people here who know it very well. Click the 'install now' button at cygwin.org, and when the selection box appears in the install process, also select gcc. There is a _lot_ of stuff available for cygwin that the default install ignores. That said, it won't help you for R as you cannot build R under Cygwin. The R Extensions and R Admin manuals for details -- you'd want MinGW, a cousin of Cygwin, on the PC. Hth, Dirk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Lines for plot (Sweave)
You can use: lines(x[!is.na[stu.vector], stu.vector[!is.na(stu.vector)], lty=1, col='blue') -Don At 1:43 PM -0400 7/1/05, Doran, Harold wrote: Dear List: I am generating a series of plots iteratively using Sweave. In short, a dataframe is subsetted row by row and variable graphics are created conditional on the data in each row. In this particular case, this code ends up generating 17,000 individual plots. In some cases, all student data (this is working with student achievement data) are available and my code below works very well in the sense that a line connects all points. However, in some cases there are missing data and I need to modify my code so that lines are connected through all points even when data are missing. Here is a snip of relevant code. In the actual program, the data in stu.vector and avg.vector are obtained from the dataframe as the programs loops through each row. stu.vector-c(2500, 2510, NA , 2600) avg.vector-c(2635, 2589, 2628, 2685) x - c(0,1,2,3) graph.min - min(stu.vector,avg.vector ,na.rm=TRUE)-150 graph.max - max(stu.vector,avg.vector ,na.rm=TRUE)+150 plot(x, stu.vector, ylim=c(graph.min,graph.max), xlab= , ylab=Scaled Score, xaxt='n', pch=2, col='blue', main=Math Growth Rate) points(x, avg.vector, pch=1, col='red') lines(x, stu.vector, lty=1, col='blue') lines(x, avg.vector, lty=3, col='red') If the NA did not exist in the object stu.vector then all points would be connected with lines. However, in some cases data are missing and I need to connect the data in stu.vector with lines. So in this case, the line would connect points 1 and 2, and then 2 and 4 even though point 3 is missing. Can anyone suggest how I might do this? Thanks, Harold [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Setting lattice boxplot's lines to black
Hi R users. I'm using the lattice library and I need a print version for my graphics. How Set I my boxplot's lines to black tpl-trellis.par.get(plot.line) tpl$col-black trellis.par.set(plot.line, tpl) Don't work. Boxplot's lines aren't black. Thanks a lot. This is my script. library(lattice) # I set background's color to white tbg-trellis.par.get(background) tbg$col-white trellis.par.set(background, tbg) # Set strip background's color to white tsbg-trellis.par.get(strip.background) tsbg$col-white trellis.par.set(strip.background, tsbg) # I set symbol's color to black tps-trellis.par.get(plot.symbol) tps$col-black trellis.par.set(plot.symbol, tps) # Set line's color to black tpl-trellis.par.get(plot.line) tpl$col-black trellis.par.set(plot.line, tpl) print(bwplot(Sepal.Length~Species ,data=iris)) # This work whit xyplot but don't work whit bwplot, the boxplot's # lines aren't black. How Set I my boxplot's lines to black Thanks a lot. -- Mario Alfonso Morales Rivera Profesor Auxiliar. Departamento de Matemáticas y Estadistica. Universidad de Códoba. Colombia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Generating correlated data from uniform distribution
Dear R users, I want to generate two random variables (X1, X2) from uniform distribution (-0.5, 0.5) with a specified correlation coefficient r. Does anyone know how to do it in R? Many thanks! Menghui __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Setting lattice boxplot's lines to black
On 7/1/05, Mario Alfonso Morales Rivera [EMAIL PROTECTED] wrote: Hi R users. I'm using the lattice library and I need a print version for my graphics. How Set I my boxplot's lines to black (As I said in a private reply,) you seem to want a black and white plot, so use trellis.device(color = FALSE) to initialize the device. Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Generating correlated data from uniform distribution
dat-matrix(runif(2000),2,1000) rho-.77 R-matrix(c(1,rho,rho,1),2,2) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.7513892 dat-matrix(runif(2),2,1) rho-.28 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.2681669 dat-matrix(runif(20),2,10) rho-.28 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.2814035 See ?choleski -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Menghui Chen Sent: July 1, 2005 4:49 PM To: r-help@stat.math.ethz.ch Subject: [R] Generating correlated data from uniform distribution Dear R users, I want to generate two random variables (X1, X2) from uniform distribution (-0.5, 0.5) with a specified correlation coefficient r. Does anyone know how to do it in R? Many thanks! Menghui __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Generating correlated data from uniform distribution
Isn't this a little trickier with non-normal variables? It sounds like Menghui Chen wants variables that have uniform marginal distribution, and a specified correlation. When I look at histograms (or just the quantiles) of the rows of dat2 in your example, I see something for dat2[2,] that does not look much like it comes from a uniform distribution. dat-matrix(runif(2000),2,1000) rho-.77 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.7513892 hist(dat2[1,]) hist(dat2[2,]) quantile(dat2[1,]) 0% 25% 50% 75%100% 0.000655829 0.246216035 0.507075912 0.745158441 0.16418 quantile(dat2[2,]) 0% 25% 50% 75% 100% 0.0393046 0.4980066 0.7150426 0.9208855 1.3864704 -- Tony Plate Jim Brennan wrote: dat-matrix(runif(2000),2,1000) rho-.77 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.7513892 dat-matrix(runif(2),2,1) rho-.28 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.2681669 dat-matrix(runif(20),2,10) rho-.28 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.2814035 See ?choleski -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Menghui Chen Sent: July 1, 2005 4:49 PM To: r-help@stat.math.ethz.ch Subject: [R] Generating correlated data from uniform distribution Dear R users, I want to generate two random variables (X1, X2) from uniform distribution (-0.5, 0.5) with a specified correlation coefficient r. Does anyone know how to do it in R? Many thanks! Menghui __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Generating correlated data from uniform distribution
How about tetrachoric correlations? Generate correlated normal observations, then convert to uniform using pnorm: rho - 0.9 Cor - array(c(1, rho, rho, 1), dim=c(2,2)) library(mvtnorm) set.seed(1) Y - rmvnorm(1, sigma=Cor) X - pnorm(Y)-0.5 plot(X) hist(X[,1]) hist(X[,2]) cor(X) Enjoy. spencer graves Tony Plate wrote: Isn't this a little trickier with non-normal variables? It sounds like Menghui Chen wants variables that have uniform marginal distribution, and a specified correlation. When I look at histograms (or just the quantiles) of the rows of dat2 in your example, I see something for dat2[2,] that does not look much like it comes from a uniform distribution. dat-matrix(runif(2000),2,1000) rho-.77 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.7513892 hist(dat2[1,]) hist(dat2[2,]) quantile(dat2[1,]) 0% 25% 50% 75%100% 0.000655829 0.246216035 0.507075912 0.745158441 0.16418 quantile(dat2[2,]) 0% 25% 50% 75% 100% 0.0393046 0.4980066 0.7150426 0.9208855 1.3864704 -- Tony Plate Jim Brennan wrote: dat-matrix(runif(2000),2,1000) rho-.77 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.7513892 dat-matrix(runif(2),2,1) rho-.28 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.2681669 dat-matrix(runif(20),2,10) rho-.28 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.2814035 See ?choleski -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Menghui Chen Sent: July 1, 2005 4:49 PM To: r-help@stat.math.ethz.ch Subject: [R] Generating correlated data from uniform distribution Dear R users, I want to generate two random variables (X1, X2) from uniform distribution (-0.5, 0.5) with a specified correlation coefficient r. Does anyone know how to do it in R? Many thanks! Menghui __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Generating correlated data from uniform distribution
Yes you are right I guess this works only for normal data. Free advice sometimes comes with too little consideration :-) Sorry about that and thanks to Spencer for the correct way. -Original Message- From: Tony Plate [mailto:[EMAIL PROTECTED] Sent: July 1, 2005 6:01 PM To: Jim Brennan Cc: 'Menghui Chen'; r-help@stat.math.ethz.ch Subject: Re: [R] Generating correlated data from uniform distribution Isn't this a little trickier with non-normal variables? It sounds like Menghui Chen wants variables that have uniform marginal distribution, and a specified correlation. When I look at histograms (or just the quantiles) of the rows of dat2 in your example, I see something for dat2[2,] that does not look much like it comes from a uniform distribution. dat-matrix(runif(2000),2,1000) rho-.77 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.7513892 hist(dat2[1,]) hist(dat2[2,]) quantile(dat2[1,]) 0% 25% 50% 75%100% 0.000655829 0.246216035 0.507075912 0.745158441 0.16418 quantile(dat2[2,]) 0% 25% 50% 75% 100% 0.0393046 0.4980066 0.7150426 0.9208855 1.3864704 -- Tony Plate Jim Brennan wrote: dat-matrix(runif(2000),2,1000) rho-.77 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.7513892 dat-matrix(runif(2),2,1) rho-.28 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.2681669 dat-matrix(runif(20),2,10) rho-.28 R-matrix(c(1,rho,rho,1),2,2) ch-chol(R) dat2-t(ch)%*%dat cor(dat2[1,],dat2[2,]) [1] 0.2814035 See ?choleski -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Menghui Chen Sent: July 1, 2005 4:49 PM To: r-help@stat.math.ethz.ch Subject: [R] Generating correlated data from uniform distribution Dear R users, I want to generate two random variables (X1, X2) from uniform distribution (-0.5, 0.5) with a specified correlation coefficient r. Does anyone know how to do it in R? Many thanks! Menghui __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Generating correlated data from uniform distribution
Jim Brennan [EMAIL PROTECTED] writes: Yes you are right I guess this works only for normal data. Free advice sometimes comes with too little consideration :-) Worth every cent... Sorry about that and thanks to Spencer for the correct way. Hmm, but is it? Or rather, what is the relation between the correlation of the normals and that of the transformed variables? Looks nontrivial to me. Incidentally, here's a way that satisfies the criteria, but in a rather weird way: N - 1 rho - .6 x - runif(N, -.5,.5) y - x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2)) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] scope argument in step function
Thanks a lot for help in advance. I am switching from matlab to R and I guess I need some time to get rolling. I was wondering why this code : fit.0 - lm( Response ~ 1, data = ds3) step(fit.0,scope=list(upper=~.,lower=~1),data=ds3) Start: AIC= -32.66 Response ~ 1 Call: lm(formula = Response ~ 1, data = ds3) Coefficients: (Intercept) 1.301 is not working different from the following: cnames - dimnames(ds3)[[2]] cnames - cnames[-444]# last col is Response fmla - as.formula(paste( ~ ,paste(cnames,collapse=+))) step(fit.0,scope=list(upper=fmla,lower=~1),data=ds3) Start: AIC= -32.66 Response ~ 1 fmla - as.formula(paste( ~ ,paste(cnames,collapse=+))) fit.s - step(fit.0,scope=list(upper=fmla,lower=~1),data=ds3) Step: AIC= -Inf Response ~ ENTP9324 + CH1W0281 Df Sum of Sq RSS AIC none0 -Inf - CH1W0281 3 0.00381 0.00381 -115 - ENTP9324 9 1 1 -34 The dataframe ds3 is 17 by 444 and I understand it is not smart thing to run stepwise regression. What I wondered is if I pass the 'upper=~.' , it seems step() thinks the full model is current one. Not adding anymore. If this is the right answer, is there a better way than creating fmla argument in the above? Thanks! -Young. - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Generating correlated data from uniform distribution
OK now I am skeptical especially when you say in a weird way:-) This may be OK but look at plot(x,y) and I am suspicious. Is it still alright with this kind of relationship? For large N it appears Spencer's method is returning slightly lower correlation for the uniforms as compared to the normals so maybe there is a problem!?! Hope we are all learning something and Menghui gets/has what he wants . :-) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Dalgaard Sent: July 1, 2005 6:59 PM To: Jim Brennan Cc: 'Tony Plate'; 'Menghui Chen'; r-help@stat.math.ethz.ch Subject: Re: [R] Generating correlated data from uniform distribution Jim Brennan [EMAIL PROTECTED] writes: Yes you are right I guess this works only for normal data. Free advice sometimes comes with too little consideration :-) Worth every cent... Sorry about that and thanks to Spencer for the correct way. Hmm, but is it? Or rather, what is the relation between the correlation of the normals and that of the transformed variables? Looks nontrivial to me. Incidentally, here's a way that satisfies the criteria, but in a rather weird way: N - 1 rho - .6 x - runif(N, -.5,.5) y - x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2)) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Generating correlated data from uniform distribution
Peter is absolutely correct: The correlation I used was for a hidden normal process, not for the resultant correlated uniforms. This is similar to but different from tetrachoric corrrelations, about which there is a substantial literature (including an R package polycor). Why do you want correlated uniforms? What do they represent physically? Does it matter if you can match exactly a particular correlation coefficient, or is it enough to say that they are uniformily distributed random variables such that their normal scores have a specified correlation coefficient? There is so much known about the multivariate normal distribution and so little about correlated uniforms that it might be more useful to know the correlations of latent normals, for which your uniforms are what are measured. spencer graves Peter Dalgaard wrote: Jim Brennan [EMAIL PROTECTED] writes: Yes you are right I guess this works only for normal data. Free advice sometimes comes with too little consideration :-) Worth every cent... Sorry about that and thanks to Spencer for the correct way. Hmm, but is it? Or rather, what is the relation between the correlation of the normals and that of the transformed variables? Looks nontrivial to me. Incidentally, here's a way that satisfies the criteria, but in a rather weird way: N - 1 rho - .6 x - runif(N, -.5,.5) y - x * sample(c(1,-1), N, replace=T, prob=c((1+rho)/2,(1-rho)/2)) -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Generating correlated data from uniform distribution
Dear Menghui, You may consider looking in Luc Devroye's Non-uniform Random Number Generation. Despite its title, section XI.3.2 describes how to generate bivariate uniforms. The book is out of print but Devroye himself urges you to print it from his scanned PDFs(!): http://cgm.cs.mcgill.ca/~luc/rnbookindex.html Hope this helps, alejandro On 7/1/05, Menghui Chen [EMAIL PROTECTED] wrote: Dear R users, I want to generate two random variables (X1, X2) from uniform distribution (-0.5, 0.5) with a specified correlation coefficient r. Does anyone know how to do it in R? Many thanks! Menghui __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] as.Date today ?
I have a Date variable that I constructed with as.Date() How ca I compare it to today (,,==) ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] as.Date today ?
The following is a minor modification of examples in the help for as.Date: x - c(1jan1960, 2jan1960, 31mar1960, 30jul2006) z - as.Date(x, %d%b%Y) z Sys.Date() [1] TRUE TRUE TRUE FALSE How's this? spencer graves Omar Lakkis wrote: I have a Date variable that I constructed with as.Date() How ca I compare it to today (,,==) ? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Is it possible to use glm() with 30 observations?
I have a very simple problem. When using glm to fit binary logistic regression model, sometimes I receive the following warning: Warning messages: 1: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, 2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, What does this output tell me? Since I only have 30 observations, i assume this is a small sample problem. Is it possible to fit this model in R with only 30 observations? Could any expert provide suggestions to avoid the warning? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Lines for plot (Sweave)
A variation on your idea might be: fo - stu.vector ~ x lines(fo, model.frame(fo), lty=1, col='blue') On 7/1/05, Don MacQueen [EMAIL PROTECTED] wrote: You can use: lines(x[!is.na[stu.vector], stu.vector[!is.na(stu.vector)], lty=1, col='blue') -Don At 1:43 PM -0400 7/1/05, Doran, Harold wrote: Dear List: I am generating a series of plots iteratively using Sweave. In short, a dataframe is subsetted row by row and variable graphics are created conditional on the data in each row. In this particular case, this code ends up generating 17,000 individual plots. In some cases, all student data (this is working with student achievement data) are available and my code below works very well in the sense that a line connects all points. However, in some cases there are missing data and I need to modify my code so that lines are connected through all points even when data are missing. Here is a snip of relevant code. In the actual program, the data in stu.vector and avg.vector are obtained from the dataframe as the programs loops through each row. stu.vector-c(2500, 2510, NA , 2600) avg.vector-c(2635, 2589, 2628, 2685) x - c(0,1,2,3) graph.min - min(stu.vector,avg.vector ,na.rm=TRUE)-150 graph.max - max(stu.vector,avg.vector ,na.rm=TRUE)+150 plot(x, stu.vector, ylim=c(graph.min,graph.max), xlab= , ylab=Scaled Score, xaxt='n', pch=2, col='blue', main=Math Growth Rate) points(x, avg.vector, pch=1, col='red') lines(x, stu.vector, lty=1, col='blue') lines(x, avg.vector, lty=3, col='red') If the NA did not exist in the object stu.vector then all points would be connected with lines. However, in some cases data are missing and I need to connect the data in stu.vector with lines. So in this case, the line would connect points 1 and 2, and then 2 and 4 even though point 3 is missing. Can anyone suggest how I might do this? Thanks, Harold [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Is it possible to use glm() with 30 observations?
The issue is not 30 observations but whether it is possible to perfectly separate the two possible outcomes. Consider the following: tst.glm - data.frame(x=1:3, y=c(0, 1, 0)) glm(y~x, family=binomial, data=tst.glm) tst2.glm - data.frame(x=1:1000, y=rep(0:1, each=500)) glm(y~x, family=binomial, data=tst2.glm) The algorithm fits y~x to tst.glm without complaining for tst.glm, but issues warnings for tst2.glm. This is called the Hauck-Donner effect, and RSiteSearch(Hauck-Donner) just now produced 8 hits. For more information, look for Hauck-Donnner in the index of Venables, W. N. and Ripley, B. D. (2002) _Modern Applied Statistics with S._ New York: Springer. (If you don't already have this book, I recommend you give serious consideration to purchasing a copy. It is excellent on many issues relating to statistical analysis and R. Spencer Graves Kerry Bush wrote: I have a very simple problem. When using glm to fit binary logistic regression model, sometimes I receive the following warning: Warning messages: 1: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, 2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, What does this output tell me? Since I only have 30 observations, i assume this is a small sample problem. Is it possible to fit this model in R with only 30 observations? Could any expert provide suggestions to avoid the warning? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Can you help?
Please take the time to read this email. What you might view as spam could be an animals last chance. Animals don't stand a chance without the everyone's help. Some people don't care, some don't think they can make a difference. There is always something YOU can do to help. We desperately need your support to continue to assist HORSES in need. Even $5 goes a long way towards helping. http://www.perfectharmony-ms.org You can visit our ebay auctions at http://cgi6.ebay.com/ws/eBayISAPI.dll?ViewSellersOtherItemsuserid=perfectharmony-ms Horses like Tinkerbelle would never have had a chance for a decent life without our help. Tinker is a 10 year old pony mare that is an extremely hard keeper. She was abused and untrained and sold to the person that donated her to our facility because she didn't feel that she could dedicate what Tinker needed properly. She has been here now for a year with us. She is slowly learning to trust. She is now coming up to strangers when they visit us, although she is a little standoffish. Tinker has been with us this long because it seems no one wants her. Because she isn't broke to ride, she will most likely never find a home of her very own. From what we know of Tinker, she would like a small boy of her very own that is gentle and experienced. She has never kicked or bitten, she is just afraid for herself. Tinker needs her teeth floated now, she needs special feed to keep weight on and she needs training under saddle. Without these things she will have to live out her life with us never knowing what it is like to be 'the' cherished pony of some loving child. We cannot give her these things without help from the public. Pictures of Tinker can be seen at http://tinker.perfectharmony-ms.org. Perfect Harmony Animal Rescue and Sanctuary http://www.perfectharmony-ms.org PayPal address [EMAIL PROTECTED] unsub: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html