[R] apply and sort vs vectorized order
Dear all, Trying to solve a problem I had (see thread putting NAs at the end ) I've noticed a difference in system time requirements between using apply and sort (or order) to order each row or column of a matrix compared to a vectorized function I wrote. Using apply is much faster when the number of loops (number of rows or columns to order) is low BUT much slower when number of loops are high and the other dimension short. Here is my function: order.rc-function(A,row.column=1,na.last = TRUE, decreasing = FALSE,return.sort=TRUE) { # removes negative values scaling A so min(A)=0 A.order-A+abs(min(A,na.rm=TRUE)) # rescales A so max(A)=0.1 A.order-A.order/(max(A.order,na.rm=TRUE)*10) # makes NAs=0 (na.last=FALSE) or NAs=0.9 (na.last=TRUE) # NOTE: if decreasing is TRUE NAs are the inverse of above if ((na.last !decreasing) | (!na.last decreasing)) A.order[which(is.na(A.order))]-0.9 else A.order[which(is.na(A.order))]-0 # if ordering each row the integer part of A is the column index (row.column=1) # else, we are ordering each column so the integer part of A is the column index if (row.column==1) A.order-A.order+rep(1:nrow(A),ncol(A)) else A.order-A.order+rep(1:ncol(A),each=nrow(A)) # returns either a matrix with sorted values or the ordering indexes if (return.sort) { A.order-A[order(A.order,decreasing=decreasing)] if (row.column==1) { dim(A.order)-dim(t(A)) A.order-t(A.order) } else dim(A.order)-dim(A) return(A.order) } else return(order(A.order,decreasing=decreasing)) } # Some system time comparisons # CHANGE Nrandom ACORDING TO YOUR SYSTEM Nrandom=1000 A-matrix(rnorm(Nrandom*Nrandom),nrow=Nrandom,ncol=Nrandom) A[rbind(c(100,3),c(90,9),c(40,6))]-NA system.time({A.r-order.rc(A)}) system.time(A.s1-apply(A,1,sort)) system.time({A.c-order.rc(A,row.column=2)}) system.time(A.s2-apply(A,2,sort)) A-matrix(rnorm(Nrandom*Nrandom),nrow=Nrandom*Nrandom/10,ncol=10) A[rbind(c(100,3),c(90,9),c(40,6))]-NA system.time({A.r-order.rc(A)}) system.time(A.s1-apply(A,1,order)) system.time({A.c-order.rc(A,row.column=2)}) system.time(A.s2-apply(A,2,order)) I think only the third apply is slower than the function because number of loops is too high and my function is faster despite the long vector to order. Thanks for any clarifications on how all this works, Angel __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] rterm not shutting down from ESS on Win32/could we help?
Hi to all who suffer from rterm not shutting down in xemacs/ESS on windows NT or 2000. Also hi to those who could eventually help. Here is some more information which could help and some ENCOURAGEMENT to contribute to a solution. 1. It may be an xemacs problem but it is more likely an interaction between rterm/comint/and xemacs. In fact, the problem started occurring around version R 1.6.0. I remember that it wasn't there in 1.5.1. It was not associated to a simultaneous upgrade of xemacs or of the windows operating system. So, there is an R contribution to the problem. 2. I'm also using other tools which pass through the command interpreter (I tried running mysql and python). Also those applications can under conditions which I cannot reproduce yet lead to not shutting down appropriately. So, there seems to be an xemacs contribution. 3. Here are a few questions which may help us figure out where exactly this is going wrong: - does it also happen with other dialects using ESS, such as S+, SAS, ...? - does it also happen with current gnu-emacs on a system on which it happens with xemacs? - has anyone succeeded to run it within a debugger? what did you learn from it? 4. Would there be an obvious person who could centralize this information? Christian Ritter Functional Specialist Statistics Shell Coordination Centre S.A. Monnet Centre International Laboratory, Avenue Jean Monnet 1, B-1348 Louvain-La-Neuve, Belgium Tel: +32 10 477 349 Fax: +32 10 477 219 Email: [EMAIL PROTECTED] Internet: http://www.shell.com/chemicals -Original Message- From: A.J. Rossini [mailto:[EMAIL PROTECTED] Sent: Saturday, August 16, 2003 12:21 AM To: Jeff D. Hamann Cc: [EMAIL PROTECTED] Subject: Re: [R] rterm not shutting down from ESS on Win32 Jeff D. Hamann [EMAIL PROTECTED] writes: I've been having problems with Rterm.exe not shutting down when I exit an R (1.7.0 and 1.7.1) session from within emacs when using ESS. I've just upgraded to 5.1.24 and still have the same problems. I'm running ntemacs and winxp. I don't recall having these troubles with version 1.6.2 of R. Known problem. R-core claims its an Emacs problem. It's not an ESS problem, but we are looking into it. Might not be solvable. We'll see... (I'll have better access to a Windows box soon). best, -tony -- A.J. Rossini [EMAIL PROTECTED]http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW : FAX=206-543-3461 | moving soon to a permanent office FHCRC: 206-667-7025 FAX=206-667-4812 | Voicemail is pretty sketchy/use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] New package: irregular time-series (its)
The issue to which you refer is putting 2 time-series with non-aligned (or inconsistent) time-stamps onto a consistent basis. There are four functions to assist with this task in the 'its' package: try ?its-join for help. In essence, there are two approaches to the problem. In one, you take the union of the time-stamps as the consistent basis, and in the other approach, you take the intersection. From what you say, it sounds like you want the union, for your application, i.e. unionIts(A,B). You can then apply the interpolation function of your choice. If you have further points to raise, and the documentation provided does not answer your questions, I suggest you contact me off-list. Regards Giles Heywood -Original Message- From: Fan [mailto:[EMAIL PROTECTED] Sent: 17 August 2003 17:12 To: Heywood, Giles Subject: Re: [R] New package: irregular time-series (its) Hello Giles, Congratulations for your contributed R package its. I'm having a little problem, and I'd like to know if there's already a general function in its (or other packages) to manage it. If not, could you add such function in its ? With irregular time series, we need sometimes MERGE 2 or several time series with different periodicities, for example: A = a monthly regular time series (ex. end of each month), B = a business daily time series To analyse conjointly the 2 series, one need to expand A to a daily series with the same time points as B (with the option of different method of interpolation: constant, linear, spline, etc.). Thanks in advance -- Fan Heywood, Giles wrote: I have uploaded to CRAN a new package named 'its' (Irregular Time-Series). It implements irregular time-series as an S4 class, extending the matrix class, and records the time-stamp of each row in the matrix using POSIX. Print, plot, extraction, append, and related functionality are available. Feedback and suggestions are welcome. Giles Heywood ** This is a commercial communication from Commerzbank AG.\ \ T...{{dropped}} ___ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-announce __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help ** This is a commercial communication from Commerzbank AG.\ \ T...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] 3D pie
Hello, is there a function for 3D pie representation in R ? Thanks Klaus-P. -- Dr. Klaus-Peter Pleissner Max Planck Institute for Infection Biology Campus Charité Mitte Schumannstr. 21/22 D-10117 Berlin Germany phone: +49-30-28460-119 fax: +49-30-28460-507 email: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Would like to apply a weight variable to the summaryfunction in Hmisc
On Sun, 17 Aug 2003 21:31:38 -0500 Greg Blevins [EMAIL PROTECTED] wrote: Hello, In the Hmisc package, functions describe and summarize can explicitly take a weight variable. My question is can a weight variable be applied when using 'summary'? For example, using...summary(var1 ~ var2) I would like to weight the data by var 3 (same length). Is this possible? No but you can do this with summarize (if all you want is cross-classification) using an example I posted on r-help a few weeks ago. Frank Thanks a lot. Greg Blevins The Market Solutions Group, Inc. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help --- Frank E Harrell Jr Prof. of Biostatistics Statistics Div. of Biostatistics Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] 3D pie
On Mon, 18 Aug 2003 13:18:10 +0200 Klaus-Peter Pleissner [EMAIL PROTECTED] wrote: Hello, is there a function for 3D pie representation in R ? Thanks Klaus-P. I hope not. See Edward Tufte's writings on chartjunk. Frank -- Dr. Klaus-Peter Pleissner Max Planck Institute for Infection Biology Campus Charité Mitte Schumannstr. 21/22 D-10117 Berlin Germany phone: +49-30-28460-119 fax: +49-30-28460-507 email: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help --- Frank E Harrell Jr Prof. of Biostatistics Statistics Div. of Biostatistics Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] 3D pie
Klaus-Peter Pleissner [EMAIL PROTECTED] wrote: is there a function for 3D pie representation in R ? I certainly hope not!!! (1) ``Given their low data density and failure to order numbers along a visual dimension, pie charts should never be used.'' (Tuffte, Edward R., ``The Visual Display of Quantitative Information'' Graphics Press, Chessire CT, 1983, p. 178.) (2) ``3D pie charts are even worse, as they also add a visual distortion ...'' (``How to Construct Bad Charts and Graphs'', Klass, Gary, Department of Politics and Government, Illinois State University, 2001.) (http://lilt.ilstu.edu/gmklass/pos138/datadisplay/badchart.htm) cheers, Rolf Turner [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] 3D pie
On Mon, 2003-08-18 at 06:58, Frank E Harrell Jr wrote: On Mon, 18 Aug 2003 13:18:10 +0200 Klaus-Peter Pleissner [EMAIL PROTECTED] wrote: Hello, is there a function for 3D pie representation in R ? Thanks Klaus-P. I hope not. See Edward Tufte's writings on chartjunk. Frank I'll toss in one more: William Cleveland's The Elements of Graphing Data. Chapter 4 (Graphical Perception), Section 10, called Pop Charts. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] Any interest in commercial add-on libraries based onCyte l's StatXact/LogXact?
Another example: Jerry Friedman's MART is available in R from Salford for the same price as the stand-alone TreeNet, even though they don't advertise it on their web site. Andy -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Sunday, August 17, 2003 9:50 PM To: rhelp Cc: [EMAIL PROTECTED] Subject: [R] Any interest in commercial add-on libraries based on Cytel's StatXact/LogXact? At JSM, I spent a bit of time with old friends at the Cytel booth (makers of StatXact/LogXact). They were wondering whether it was both feasible and of interest to create a package of the StatXact compute engine for R (to be commercially licensed, not for free!), similar to what they've done for SAS. As far as I know, it's feasible, (this is not the first commercial external package, for those of you about to scream no, we can't allow commercial libraries!; in fact, you'd probably have to fork the R codebase if you truly insist on that. For example,CSIRO's Spot package for R, http://spot.cmis.csiro.au/spot/index.php) and so that remaining question is whether there would be sufficient interest for them to continue exploring the possibility from a financial perspective. Since it's not really on-topic for discussion on this mailing list, if you are interested and would imagine purchasing a license, please send mail to Pralay Senchaudhuri, pralay AT cytel.com. I'd imagine that the licensing fees would be similar to those charged for SAS or the standalone windows versions, but that is only my conjecture, and not Cytel's. (Truth in Advertising: my main interest is to make it easier to use their software for my work in clinical trials protocol design; I currently access their implementation through SAS. Also, while I am not currently involved with Cytel in any way (nor will be in the foreseeable future), I did spend part of 2 months editing and validating the examples from an early version of the LogXact manual as a grad student around 12 years ago, and have a number of good friends who still work there). best, -tony -- A.J. Rossini [EMAIL PROTECTED] http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW : FAX=206-543-3461 | moving soon to a permanent office FHCRC: 206-667-7025 FAX=206-667-4812 | Voicemail is pretty sketchy/use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo /r-help -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (Whitehouse Station, New Jersey, USA), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] vectorization question
Thank you very much to Tony Plate for his really clear explanation, and to Prof Ripley for his time solving this deficiency (IMHO) On Friday 15 August 2003 08:44, Martin Maechler wrote: Thank you, Tony. This certainly was the most precise explanation on this thread. Everyone note however, that this has been improved (by Brian Ripley) in the current R-devel {which should be come R 1.8 in October}. There, also $- assignment of data frames does check things and in this case will do the same replication as the [,] or [[]] assignments do. For back compatibility (with S-plus and earlier R versions), I'd still recommend using bracket [ rather than $ assignment for data frames. Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology)8092 Zurich SWITZERLAND phone: x-41-1-632-3408fax: ...-1228 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Alberto G. Murta Institute for Agriculture and Fisheries Research (INIAP-IPIMAR) Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062 Fax:+351 213015948 | http://www.ipimar-iniap.ipimar.pt/pelagicos/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] 3D pie
R-help Readers might also find amusing the new Tufte paper: The Cognitive Style of PowerPoint, available from www.edwardtufte.com. (This is a non-commercial announcement.) url:www.econ.uiuc.edu/~roger/my.htmlRoger Koenker email [EMAIL PROTECTED] Department of Economics vox:217-333-4558University of Illinois fax:217-244-6678Champaign, IL 61820 On Mon, 18 Aug 2003, Marc Schwartz wrote: I'll toss in one more: William Cleveland's The Elements of Graphing Data. Chapter 4 (Graphical Perception), Section 10, called Pop Charts. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] glmmPQL() and memory limitations
Hi, all, When running glmmPQL(), I keep getting errors like Error: cannot allocate vector of size 61965 Kb Execution halted This is R-1.7.1. The data set consists of about 11,000 binary responses from 16 subjects. The model is fixed = SonResp ~ (ordered (Stop) + ordered (Son)) * StopResp, random = ~ 1 + (ordered (Stop) + ordered (Son)) * StopResp | Subj family = binomial (link = logit) SonResp and StopResp are binary; Stop and Son are ordered factors with six levels each. The machine I'm running this on is my university's scientific server, a Beowulf Linux cluster; the machine this job would be running on would have two 1.4 GHz CPUS, a 2-gigabyte RAM, and an 18-gigabyte hard disk, plus 130 gigabytes of scratch file space; it would be running Red Hat Linux 7.2 with XFS. Can anyone tell me whether this is (1) a problem with the model (no machine could fit it in the lifetime of the universe), (2) a problem with how I formulated the model (there's a way to get the same end result without overflowing memory), (c) a problem with glmmPQL() (that could be fixed by using some other package), (d) a problem with the machine I'm running it on (need more real or virtual memory), or (e) other? (Naturally, I've contacted the system administrators to ask them the same thing, but I don't know how much they know about R.) Many thanks in advance, Elliott Moreton __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] rterm not shutting down from ESS on Win32/could we help?
Please move this over to ESS-help -- this is off-topic for many R users. Ritter, Christian C MCIL-CTGAS [EMAIL PROTECTED] writes: Hi to all who suffer from rterm not shutting down in xemacs/ESS on -- A.J. Rossini [EMAIL PROTECTED]http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW : FAX=206-543-3461 | moving soon to a permanent office FHCRC: 206-667-7025 FAX=206-667-4812 | Voicemail is pretty sketchy/use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] General help pages
Uwe Ligges [EMAIL PROTECTED] wrote: Please follow that [?sort] reference and read ?Comparison as well. There seem to be several very useful general help pages like ?Comparison, ?Devices, and ?Startup, which do not tie to a specific function. Is there a list of these? I'd like some bedtime reading. -- -- David Brahm ([EMAIL PROTECTED]) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] General help pages
David Brahm wrote: Uwe Ligges [EMAIL PROTECTED] wrote: Please follow that [?sort] reference and read ?Comparison as well. There seem to be several very useful general help pages like ?Comparison, ?Devices, and ?Startup, which do not tie to a specific function. Is there a list of these? I'd like some bedtime reading. I don't know about a complete listing of such functions (well, what should be listed in there and what not), but for some basic stuff, ?Syntax has some links in its See Also section. Uwe Ligges __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Extracting from population estimation, individual estimations
Hello to everyone! I fitted a nonlinear mixed model to a dataframe. The dataframe contained among others a variable Subject.For each subject there were several entries(measurements), exactly sixteen. I fitted the nonlinear mixed model using the variable Subject as a grouping factor in the random statement of the model. There were two parameters that had to be estimated. The two estimations reflect though the estimations for the population. I am looking for a way to extract of these population estimations the individual estimations of the parameters for each subject, i.e. the predictions (not the predicted values) for each subject. There is for example a possibility in SAS to do this, is there any in R as well? Thanks for answers. Regards Dassy __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Re: Oja median
I am shocked and dismayed (and the term hasn't even started yet ;-) ) that none of you have turned in the weekend homework problem that I assigned last Friday. At the risk of further embarrassment I am posting an answer in the hope that this will inspire someone to suggest some improvements. In particular you will see that the loop in the function cofactors when subjected to the apply is painfully slow. Using Rprof on an example with n=50 and p=3 shows that 160 seconds of the 167 needed were spent in this apply. Yes, I'm aware that this could be recoded in [C, fortran, ...], but that would be considered cheating. mean.wilks - function(x){ # Computes the column means of the matrix x -- very slwly. n - dim(x)[1] p - dim(x)[2] A - t(combn(1:n,p)) X - NULL for(i in 1:p) X - cbind(X,x[A[,i],]) oja.ize - function(v)cofactors(matrix(v,sqrt(length(v A - t(apply(X,1,oja.ize)) coef(lm(-A[,1]~A[,-1]-1)) } cofactors - function(A){ B - rbind(1,cbind(A,1)) p - ncol(B) x - rep(0,p) for(i in 1:p) x[i] - ((-1)^(i+p)) *det(B[-i,-p]) return(x) } url:www.econ.uiuc.edu/~roger/my.htmlRoger Koenker email [EMAIL PROTECTED] Department of Economics vox:217-333-4558University of Illinois fax:217-244-6678Champaign, IL 61820 On Fri, 15 Aug 2003, Roger Koenker wrote: I discovered recently that the phrase Oja median produces no hits in Jonathan Baron's very valuable R search engine. I found this surprising since I've long regarded this idea as one of the more interesting notions in the multivariate robustness literature. To begin to remedy this oversight I wrote a bivariate version and then decided that writing a general p-variate version might make a nice weekend programming puzzle for R-help. Here are a few more details: The Oja median of n p-variate observations minimizes over theta in R^p the sum of the volumes of the simplices formed by theta, and p of the observed points, the sum being taken over all n choose p groups of p observations. Thus, in the bivariate case we are minimizing the sum of the areas of all triangles formed by the the point theta and pairs of observations. Here is a simple bivariate implementation: oja.median -function(x) { #bivariate version -- x is assumed to be an n by 2 matrix require(quantreg) n - dim(x)[1] A - matrix(rep(1:n, n), n) i - A[col(A) row(A)] j - A[n + 1. - col(A) row(A)] xx - cbind(x[i, ], x[j, ]) y - xx[, 1] * xx[, 4] - xx[, 2] * xx[, 3] z1 - (xx[, 4] - xx[, 2]) z2 - - (xx[, 3] - xx[, 1]) return(rq(y~cbind(z1, z2)-1)$coef) } To understand the strategy, note that the area of the triangle formed by the points x_i = (x_i1,x_i2), x_j = (x_j1,x_j2), and theta = (theta_1,theta_2) is given by the determinant, | 11 1 | Delta(x_i, x_j, theta) = .5 |y_i1 yj1 theta_1|. |y_i2 yj2 theta_2| Expanding the determinant in the unknown parameters theta gives the l1 regression formulation. Remarkably, a result of Wilks says that if the call to rq() is replaced with a call to lm() you get the sample mean -- this gives an impressively inefficient least squares regression based alternative to apply(x,2,mean)! It also provides a useful debugging check for proposed algorithms. Obviously, the expansion of the determinant gives the same formulation for p2, the challenge is to find a clean way to generate the design matrix and response vector for the general setting. Bon weekend! url: www.econ.uiuc.edu/~roger/my.htmlRoger Koenker email [EMAIL PROTECTED] Department of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Champaign, IL 61820 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] General help pages
Following Uwe's and Dirk's suggestions, here's a first pass at a list of interesting general help pages: ?Arithmetic ?Comparison ?Control ?DateTimeClasses ?Defunct ?Deprecated ?Devices ?Extract ?Foreign ?Logic ?Memory ?Paren ?Rdconv (RdUtils page: Rdconv, Rd2dvi, Rd2txt, Sd2Rd) ?Special (beta, gamma, choose, ...) ?Startup ?Syntax ?build (PkgUtils page: R CMD build, R cmd check) ?connections (file, pipe, ...) ?pi (Constants page: LETTERS, letters, month.abb, month.name, pi) Note that the RdUtils, PkgUtils, and Constants pages cannot be accessed through their names (?RdUtils, etc). I'd also suggest two additional general pages (which do not currently exist): ?System (system, .Platform, Sys.info, Sys.getenv, Sys.putenv, getwd, setwd) ?Graphics (covering plot, lines, points, segments, par, Devices) -- -- David Brahm ([EMAIL PROTECTED]) __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] R and Poisson
Hi, I wonder if anyone can answer the following or point me in the direction of how to obtain answers to the questions. Below is Output from R and further down are the questions raised and explanation of the study. Output from R: glm(formula = CB95TO00 ~ URB + INC, family = poisson) Deviance Residuals: Min 1Q Median 3Q Max -1.2272 -1.1290 0.2709 0.4272 2.1376 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -0.306210.13499 -2.268 0.0233 * URB2 0.022530.16826 0.134 0.8935 URB3-0.009360.15263 -0.061 0.9511 INC2-0.144300.12342 -1.169 0.2423 INC3-0.550920.31351 -1.757 0.0789 . --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 403.97 on 420 degrees of freedom Residual deviance: 399.61 on 416 degrees of freedom AIC: 883.8 Number of Fisher Scoring iterations: 4 Explanation and Questions raised. The dependent variable is: Number of children born in last 5 years: (values range from 0 to 3). Distribution of dependent variable (named CB95TO00) 0 203 1 157 259 3 2 Predictors are: Level of Urbanisation (3 categories 1: Rural; 2:Semi-Urban; 3: Urban) Income Level (3 categories: 1: Low; 2:Medium; 3: High) The questions are (1) how does one interpret the coefficients in the output: Our interpretation is Urb2 compared to Urb1 gives an estimate of .02253; Urb3 compared to Urb1 gives a parameter estimate of -.00936 etc. Neither of these shows significance. How does one interpret this exactly with regards to the dependent variable which is Number of children? 2) How does one interpret the intercept which shows significance? 3) What does the Null Deviance tell us and the Residual Deviance? 4) What does the AIC tell us? 5) Is it possible to obtain goodness of fit statistics such as Pearson ChiSquare and Log-Likelihood similar to what SAS statistical software gives? 6) Is it possible to find out if Urbanisation and Income are significant overall in R? Thanks in advance for any assistance, Regards, Paul == Paul McGeoghan, Application support specialist (Statistics and Databases), Information Services, Cardiff University. Tel. 02920 (875035). __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] R Install on Solaris 9
I am trying to install R-1.7.1 or R-1.6.2 on solaris 9 but the configure is failing on me: Below is the error. Anybody with similar experience out there? Your help will be appreciated highly! checking for an ANSI C-conforming const... yes checking for int... yes checking size of int... configure: error: cannot compute sizeof (int), 77 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, August 18, 2003 2:44 PM To: Akpodigha Filatei Subject: Welcome to the R-help mailing list Welcome to the [EMAIL PROTECTED] mailing list! To post to this list, send your email to: [EMAIL PROTECTED] General information about the mailing list is at: https://www.stat.math.ethz.ch/mailman/listinfo/r-help If you ever want to unsubscribe or change your options (eg, switch to or from digest mode, change your password, etc.), visit your subscription page at: https://www.stat.math.ethz.ch/mailman/options/r-help/afilatei%403rdmill.com You can also make such adjustments via email by sending a message to: [EMAIL PROTECTED] with the word `help' in the subject or body (don't include the quotes), and you will get back a message with instructions. You must know your password to change your options (including changing the password, itself) or to unsubscribe. It is: foozbi Normally, Mailman will remind you of your stat.math.ethz.ch mailing list passwords once every month, although you can disable this if you prefer. This reminder will also include instructions on how to unsubscribe or change your account options. There is also a button on your options page that will email your current password to you. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] type I and type III sums of squares
Paul - Your question is best answered by a textbook reference, because that will supply all the context needed to fully answer your question. A good, basic reference is: George W. Snedecor and William G. Cochran (1980) Statistical Methods, 7th edition. Iowa State Univ. Press. ISBN: 0-8138-1560-6; LC: QA 276.12 .S591 1980 (I have the Taubman copy already checked out - others in the Science Library.) A more advanced reference is: George A. Milliken and Dallas E. Johnson (1984) Analysis of messy data (2 vols.) Van Nostrand Reinhold, NY ISBN: 0-534-02713-x; LC: QA 279 .M481 1984 (Science library only, more recent edition in Public Health library.) The terms type I and type III are specific to SAS software. Their precise definitions are given in the SAS documentation. I don't have a copy handy. George Milliken was a contributor to the SAS software, so his definitions will coincide with SAS's. HTH - tom blackwell - program in bioinformatics and department of human genetics - u michigan medical school - ann arbor - On Mon, 18 Aug 2003, Paul Litvak wrote: I have been digging around in the FAQ's and online looking for an answer to my questions, and perhaps someone here can help me. For a statistical experiment, I need to run 3,000,000 ANOVAs, which is taking me a very long time. As a result, I have recoded my analyses in C. However, I cannot find the formula to calculate either the type I or type III sums of squares (in the case of my model, the two are equivalent). I know that the formula must be in the R source code, as they are able to calculate it, but I am not sure where. Does anyone know where I can find the explicit procedure for calculating this? A mathematical formula or the source code would be equally helpful. I am aware of the formula in matrix algebra, but is there a formulation that does not use matrix algebra? thanks very much in advance, Paul Litvak Department of Human Genetics University of Michigan __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Responses to question about order()
Thanks to all for your responses regarding my question of why order() was behaving differently for me on my linux and solaris platforms. It seems that order() behaves according to the collating sequence and character set that the particular platform assumes. I mistakingly assumed that the function would behave the same regardless of the platform. I have briefly summarized the responses I received. Don MacQueen suggested that I try creating a text file outside of R with each of the characters on a separate line to see if the different operating systems' sort command on the file produced different results. James Holtman and Uwe Ligges both pointed to the fact that the character set and collating sequence could be different on each platform, which would affect the functionality of order(). Richard A. O'Keefe suggested that I check the settings of each environment by lookign at the variables LC_COLLATE and/or LANG. He also suggested a C program for me to run to check for plausible answers. -Raja __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] type I and type III sums of squares
Not knowing any more details about your experiment and data, we can only speculate. If the reason (or part of the reason) that you need to run ANOVA 3 million times is that you have that many responses collected from the same experiment (or several experiments, but not 3 million different experiments), you should be able to do the ANOVA computation in R very efficiently. E.g., assuming you actually have one experiment with 3m responses, you can compute the hat matrix once and apply it to the response matrix, rather than computing the same hat matrix 3M times. Just a thought. HTH. Andy -Original Message- From: Paul Litvak [mailto:[EMAIL PROTECTED] Sent: Monday, August 18, 2003 2:18 PM To: [EMAIL PROTECTED] Subject: [R] type I and type III sums of squares Hello- I have been digging around in the FAQ's and online looking for an answer to my questions, and perhaps someone here can help me. For a statistical experiment, I need to run 3,000,000 ANOVAs, which is taking me a very long time. As a result, I have recoded my analyses in C. However, I cannot find the formula to calculate either the type I or type III sums of squares (in the case of my model, the two are equivalent). I know that the formula must be in the R source code, as they are able to calculate it, but I am not sure where. Does anyone know where I can find the explicit procedure for calculating this? A mathematical formula or the source code would be equally helpful. I am aware of the formula in matrix algebra, but is there a formulation that does not use matrix algebra? thanks very much in advance, Paul Litvak Department of Human Genetics University of Michigan __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo /r-help -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (Whitehouse Station, New Jersey, USA), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] FYI: Article on R at IBM's developerWorks Server Clinic
Hi all, I happened to be reviewing a Linux web site that I frequent (http://www.pclinuxonline.com/index.php) and noted today an entry for an article on R at IBM's developerWorks Server Clinic site located at http://www-106.ibm.com/developerworks/linux/library/l-sc16.html. I thought that I would pass this on as an FYI. Regards, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] R and Poisson
On Mon, 18 Aug 2003, Paul Mcgeoghan wrote: I think that Peter Dalgaard's book Introductory Statistics with R will help, especially for some of the generic functions, even though his chapter 11 is for logistic regression, not specifically the Poisson case. His book contains further references. I think you may find that help(anova.glm) will provide some insight too. Hi, I wonder if anyone can answer the following or point me in the direction of how to obtain answers to the questions. Below is Output from R and further down are the questions raised and explanation of the study. Output from R: glm(formula = CB95TO00 ~ URB + INC, family = poisson) Deviance Residuals: Min 1Q Median 3Q Max -1.2272 -1.1290 0.2709 0.4272 2.1376 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -0.306210.13499 -2.268 0.0233 * URB2 0.022530.16826 0.134 0.8935 URB3-0.009360.15263 -0.061 0.9511 INC2-0.144300.12342 -1.169 0.2423 INC3-0.550920.31351 -1.757 0.0789 . --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 403.97 on 420 degrees of freedom Residual deviance: 399.61 on 416 degrees of freedom AIC: 883.8 Number of Fisher Scoring iterations: 4 Explanation and Questions raised. The dependent variable is: Number of children born in last 5 years: (values range from 0 to 3). Distribution of dependent variable (named CB95TO00) 0 203 1 157 259 3 2 Predictors are: Level of Urbanisation (3 categories 1: Rural; 2:Semi-Urban; 3: Urban) Income Level (3 categories: 1: Low; 2:Medium; 3: High) The questions are (1) how does one interpret the coefficients in the output: Our interpretation is Urb2 compared to Urb1 gives an estimate of .02253; Urb3 compared to Urb1 gives a parameter estimate of -.00936 etc. Neither of these shows significance. How does one interpret this exactly with regards to the dependent variable which is Number of children? 2) How does one interpret the intercept which shows significance? 3) What does the Null Deviance tell us and the Residual Deviance? 4) What does the AIC tell us? 5) Is it possible to obtain goodness of fit statistics such as Pearson ChiSquare and Log-Likelihood similar to what SAS statistical software gives? 6) Is it possible to find out if Urbanisation and Income are significant overall in R? Thanks in advance for any assistance, Regards, Paul == Paul McGeoghan, Application support specialist (Statistics and Databases), Information Services, Cardiff University. Tel. 02920 (875035). __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93 e-mail: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
RE: [R] FYI: Article on R at IBM's developerWorks Server Clinic
I hesitate to ask, but in the the ibm article it states: ...R might be statistical rather than scientific in some pedantic sense... Why is that distinction necessary? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marc Schwartz Sent: Monday, August 18, 2003 2:18 PM To: [EMAIL PROTECTED] Subject: [R] FYI: Article on R at IBM's developerWorks Server Clinic Hi all, I happened to be reviewing a Linux web site that I frequent (http://www.pclinuxonline.com/index.php) and noted today an entry for an article on R at IBM's developerWorks Server Clinic site located at http://www-106.ibm.com/developerworks/linux/library/l-sc16.html. I thought that I would pass this on as an FYI. Regards, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] glmmPQL() and memory limitations
Elliott - I don't know if you've had any other responses off-list yet; none have shown up on the r-help mailing list during the day today. I'm really NOT the most expert person to answer this, but I'll give it a try. Your option (1) seems entirely possible to me. Let me do some thinking out loud to see how the numbers add up. The design matrix for the fixed effects should have dimensions 11,000 rows x ((6 + 6) * 2) = 24 columns = 264 K values The design matrix for the random effects might have dimensions 11,000 rows x ((1 + (6 + 6) * 2) * 16) = 400 columns = 4.4 M values Say 4.7 M values, total. At worst, these will be stored as 8-bit double precision numbers, (they very likely are) so 38 Mb for one copy of the logistic regression problem. Ah, but then I look at the error message you quote below, and there's some single object of 62 Mb that R is manipulating. My calculation above is low by a factor of 1.6 or so. R wants quite a lot of space to turn around in. I usually figure 4 copies of the data just to do the simplest arithmetic and assign the result. The function glmmPQL() might be keeping 10 or 20 copies of the regression problem around - but that's only 760 Mb, (assuming 38 Mb each), so if that were all, you would be okay. If each node is running an instance of the problem on both processors, then they have only 1 Gb each, and you're pretty close to the limit, including R's overhead and the operating system overhead. If there's a way to keep one processor empty on each node, that would double the memory available to each instance of the problem (but it ONLY doubles it). I observe, 11,000 rows 6 * 6 * 2 * 16 = 1152. That suggests there might be a way to collapse multiple Bernoulli outcomes at the same combination of Subject, Stop, Son and StopResp into a binomial outcome (# successes, # failures) as for glm(). I don't know whether glmmPQL() supports this response data format. (See Details in help(glm) to see what I'm talking about.) If you are able to do this, it could reduce the size of the random factor design matrix proportionately. For single-processor implementations of R, the information you might want is on the help pages help(Memory) and help(gc). I've NO experience with threaded versions and how they behave. Always, the error message you quote below only describes the last allocation event which failed. It doesn't tell you what the total that was successfully allocated in previous tries is. So it's not just the first call for 62 Mb which fails. Guess I've come to the end of whatever slight help I can offer. Please do come back and tell us what the ultimate outcome on this question turns out to be. And, if you have had other off-list responses during the day, you might summarize them in an email back to the list so that the rest of us know that your question is being dealt with appropriately. - tom blackwell - u michigan medical school - ann arbor - On Mon, 18 Aug 2003, Elliott Moreton wrote: When running glmmPQL(), I keep getting errors like Error: cannot allocate vector of size 61965 Kb Execution halted This is R-1.7.1. The data set consists of about 11,000 binary responses from 16 subjects. The model is fixed = SonResp ~ (ordered (Stop) + ordered (Son)) * StopResp, random = ~ 1 + (ordered (Stop) + ordered (Son)) * StopResp | Subj family = binomial (link = logit) SonResp and StopResp are binary; Stop and Son are ordered factors with six levels each. The machine I'm running this on is my university's scientific server, a Beowulf Linux cluster; the machine this job would be running on would have two 1.4 GHz CPUS, a 2-gigabyte RAM, and an 18-gigabyte hard disk, plus 130 gigabytes of scratch file space; it would be running Red Hat Linux 7.2 with XFS. Can anyone tell me whether this is (1) a problem with the model (no machine could fit it in the lifetime of the universe), (2) a problem with how I formulated the model (there's a way to get the same end result without overflowing memory), (c) a problem with glmmPQL() (that could be fixed by using some other package), (d) a problem with the machine I'm running it on (need more real or virtual memory), or (e) other? (Naturally, I've contacted the system administrators to ask them the same thing, but I don't know how much they know about R.) Many thanks in advance, Elliott Moreton __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] princomp scores reproduced
Hi, I used princomp for PCA analysis based on correlation matrix (cor=T). I would like to reproduce the scores for each observation by first standardizing the data matrix (mean=0, std err=1), and then multiplied by the loadings of each variable for each principle components. I get very close numbers, but not exactly the same. anything I missed here? tahnks __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] princomp scores reproduced
You might get a more informative response if you provide a toy example with data and what you did that almost but not quite produced the correct numbers. The easiest questions to answer often come with data and sample code that can be copied directly from an email message and pasted into R. Then someone can experiment with alternatively ways of doing something without taking much time away from the things they are paid to do. Otherwise, we can only guess what you did that did not produce the answer. There are an infinite number of ways to do anything wrong, and often even an infinite number of ways to get almost the right answer. sorry i couldn't be more helpful. spencer graves array chip wrote: Hi, I used princomp for PCA analysis based on correlation matrix (cor=T). I would like to reproduce the scores for each observation by first standardizing the data matrix (mean=0, std err=1), and then multiplied by the loadings of each variable for each principle components. I get very close numbers, but not exactly the same. anything I missed here? tahnks __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Books for R
[EMAIL PROTECTED] wrote: Then the best book is Venables Ripley (2002) for R. A.S. Agreed. VR (2002) also cites Bates Pinheiro's Mixed Effects Models in S and S-PLUS (2000) is particularly good for linear and non-linear mixed-effects models. Highly recommended. Cheers Jason -- Indigo Industrial Controls Ltd. 64-21-343-545 [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Command line R / PHP?
Greetings All, Just a quick query about calling R. Looking through the manual you start R with $ R, and then start calling R functions e.g plot whatever. Sounds pretty funky, and R looks to be *the* open source maths package. Awesome ... I would like to call R from my favourite glue language PHP (rather than call perl which calls R) if possible. To call R from the command line is all this would require and this also seems quite possible :: Batch use: At its simplest, Rterm.exe can be used in a batch mode by commands like Rterm.exe --no-restore --no-save infile outfile And there is more information on this in section B.1 Invoking R under UNIX cool. However what I can't find is how to specify what function I want to run on my infile, say calculate standard deviation or means or linear regression and getting probabilities translations for t-statistics. Have I missed this in the docs? Any suggestions are greatly appreciated. Awesome and a big thanks. Z. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help