Re: [R] Multicore computation in Windows network: How to set up Rmpi
Samu, Thanks you are a life saver. Your efforts saved me a ton of time. I went through basically exactly the same process as you describe. To summarize the same issue you have I get the following phenomenon: spawning appears to not happen the same every time. How can you specify how the nodes are being spawned? Did you ever figure this out? I see almost no documentation anywhere on Windows deployments. Thanks Tom Samu Mäntyniemi wrote: Hello! I finally got MPICH 1.06 + R 2.6.1 + Rmpi 0.5-5 working with multiple computers. The key was to realize that the number of processes should be one when launching Rgui using mpiexec and not the number of master+slaves, as I had first wrongly understood. However, I seem to have a new problem which I have not been able to figure out: After loading Rmpi, the first attempt to mpi.spawn.Rslaves() always spawns the slaves on the local machine instead of on both machines. If I close the slaves and spawn again, then one slave gets spawned on remote machine. Each time I close and then spawn againg, the order of machines is different, and eventually I get back to the situation where all slaves are on the local machine. Continuing to do spawning and closing seems to reveal a pattern. I can see similar behavior if I have more than two machines, and it takes more spawn-close cycles to get all my slave machines spawned on. Below is an example session with two machines. This pattern shows everytime I start R and run this script. How to control the spawning so that I get everything right at the first call of mpi.spawn.Rslaves()? Regards, Samu -- View this message in context: http://www.nabble.com/Multicore-computation-in-Windows-network%3A-How-to-set-up-Rmpi-tp14436938p16465801.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] summary(object, test=c(Roy, Wilks, Pillai, ....) AND ellipse(object, center=....)
Dear All, I would be very appreciative of your help with the following 1). I am running multivariate multiple regression through the manova() function (kindly suggested by Professor Venables) and getting two different answers for test=c(Wilks,Roy,Pillai) and tests=c(Wilks,Roy,'Pillai) as shown below. In the first case (test=c(list)) I got error message which probably means I can only call one test at a time. I thought I could get ride of this by adding s to test; in this case (tests=c(list)), I got Pillai test. Does this mean that Pillai would be the default test and summary(manova()) can only post one test at a time? summary(manova(cbind(y1, y2) ~ z1, data = + ex7.8),test=c(Wilks,Roy,Pillai)) Error in match.arg(test) : 'arg' must be of length 1 summary(manova(cbind(y1, y2) ~ z1, data = + ex7.8),tests=c(Wilks,Roy,Pillai)) Df Pillai approx F num Df den Df Pr(F) z1 1 0.9375 15. 2 2 0.0625 . Residuals 3 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 2). My next struggle is to construct prediction ellipse. Both ellipse() and ellipse.lm() are not giving me the solution to Sampling from multivariate multiple regression prediction regions posted by Iain Pardoe, Mon May 9 18:43:46 2005. I am working on the same problem and performed all the steps he suggested ex7.10 - + data.frame(y1 = c(141.5, 168.9, 154.8, 146.5, 172.8, 160.1, 108.5), + y2 = c(301.8, 396.1, 328.2, 307.4, 362.4, 369.5, 229.1), + z1 = c(123.5, 146.1, 133.9, 128.5, 151.5, 136.2, 92), + z2 = c(2.108, 9.213, 1.905, .815, 1.061, 8.603, 1.125)) attach(ex7.10) f.mlm - lm(cbind(y1,y2)~z1+z2) y.hat - c(1, 130, 7.5) %*% coef(f.mlm) round(y.hat, 2) y1 y2 [1,] 151.84 349.63 qf.z - t(c(1, 130, 7.5)) %*% + solve(t(cbind(1,z1,z2)) %*% cbind(1,z1,z2)) %*% + c(1, 130, 7.5) round(qf.z, 5) [,1] [1,] 0.36995 n.sigma.hat - SSD(f.mlm)$SSD # same as t(resid(f.mlm)) %*% resid(f.mlm) round(n.sigma.hat, 2) y1 y2 y1 5.80 5.22 y2 5.22 12.57 F.quant - qf(.95,2,3) round(F.quant, 2) [1] 9.55 From here how could I calculate a 95% prediction ellipse for y=(y1,y2) at (z1,z2)=(130,7.5) using either ellipse or ellipse.lm? y1 would be the x-axis and y2, the y-axis. The center is different from (0,0) and I don't know what would be the appropriate x (the lm object). Should I used predicted values or residuals? In both cases I have vectors which is different from the example given with ellipse.lm 3). Lastly but not the least, would be too ambitious to draw the axes (i.e, the eigenvalues) to the ellipse? Thanks and very kind regards, Ray [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] summary(object, test=c(Roy, Wilks, Pillai, ....) AND ellipse(object, center=....)
Reading the help often resolves questions, which is why we ask you to do so before posting. From ?summary.manova: Usage: ## S3 method for class 'manova': summary(object, test = c(Pillai, Wilks, Hotelling-Lawley, Roy), intercept = FALSE, ...) Arguments: object: An object of class 'manova' or an 'aov' object with multiple responses. test: The name of the test statistic to be used. Partial matching is used so the name can be abbreviated. ... The 'summary.manova' method uses a multivariate test statistic for the summary table. Wilks' statistic is most popular in the literature, but the default Pillai-Bartlett statistic is recommended by Hand and Taylor (1987). Note, not mention of 'tests' and explicitly 'a ... test'. It is a standard convention that when several alternatives are given in the usage the first is the default, _and_ the text does say this explicitly. (I might have followed up your next question had you given a URL link to the posting concerned.) On Thu, 3 Apr 2008, Ray Haraf wrote: I would be very appreciative of your help with the following 1). I am running multivariate multiple regression through the manova() function (kindly suggested by Professor Venables) and getting two different answers for test=c(Wilks,Roy,Pillai) and tests=c(Wilks,Roy,'Pillai) as shown below. In the first case (test=c(list)) I got error message which probably means I can only call one test at a time. I thought I could get ride of this by adding s to test; in this case (tests=c(list)), I got Pillai test. Does this mean that Pillai would be the default test and summary(manova()) can only post one test at a time? summary(manova(cbind(y1, y2) ~ z1, data = + ex7.8),test=c(Wilks,Roy,Pillai)) Error in match.arg(test) : 'arg' must be of length 1 summary(manova(cbind(y1, y2) ~ z1, data = + ex7.8),tests=c(Wilks,Roy,Pillai)) Df Pillai approx F num Df den Df Pr(F) z1 1 0.9375 15. 2 2 0.0625 . Residuals 3 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 2). My next struggle is to construct prediction ellipse. Both ellipse() and ellipse.lm() are not giving me the solution to Sampling from multivariate multiple regression prediction regions posted by Iain Pardoe, Mon May 9 18:43:46 2005. I am working on the same problem and performed all the steps he suggested ex7.10 - + data.frame(y1 = c(141.5, 168.9, 154.8, 146.5, 172.8, 160.1, 108.5), + y2 = c(301.8, 396.1, 328.2, 307.4, 362.4, 369.5, 229.1), + z1 = c(123.5, 146.1, 133.9, 128.5, 151.5, 136.2, 92), + z2 = c(2.108, 9.213, 1.905, .815, 1.061, 8.603, 1.125)) attach(ex7.10) f.mlm - lm(cbind(y1,y2)~z1+z2) y.hat - c(1, 130, 7.5) %*% coef(f.mlm) round(y.hat, 2) y1 y2 [1,] 151.84 349.63 qf.z - t(c(1, 130, 7.5)) %*% + solve(t(cbind(1,z1,z2)) %*% cbind(1,z1,z2)) %*% + c(1, 130, 7.5) round(qf.z, 5) [,1] [1,] 0.36995 n.sigma.hat - SSD(f.mlm)$SSD # same as t(resid(f.mlm)) %*% resid(f.mlm) round(n.sigma.hat, 2) y1 y2 y1 5.80 5.22 y2 5.22 12.57 F.quant - qf(.95,2,3) round(F.quant, 2) [1] 9.55 From here how could I calculate a 95% prediction ellipse for y=(y1,y2) at (z1,z2)=(130,7.5) using either ellipse or ellipse.lm? y1 would be the x-axis and y2, the y-axis. The center is different from (0,0) and I don't know what would be the appropriate x (the lm object). Should I used predicted values or residuals? In both cases I have vectors which is different from the example given with ellipse.lm 3). Lastly but not the least, would be too ambitious to draw the axes (i.e, the eigenvalues) to the ellipse? Thanks and very kind regards, Ray [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help with understanding stepAIC
Dear Balavelan, First I would suggest to include the main effects of all variables in an interaction. So you model with interaction should be y ~ a + b + c + d + c:d. Which can be abbreviated to y ~ a + b + c*d. Futhermore you should take a look at ?predict.glm and pay attention to the newdata and type = reponse arguments. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Bala Thanigaivelan Verzonden: donderdag 3 april 2008 4:22 Aan: r-help@r-project.org Onderwerp: [R] need help with understanding stepAIC Hi, I am currently using R extensively to develop a statistical model and currently learning its capabilities. I am trying to develop a logistic regression model using glm and stepAIC I constructed my formula as y ~ a + b + c + d Now I wish to study the interaction between variables. So I did y ~ a + b + c:d + d The software generated the coefficients for a, b , c:d, d Now I want to compute the probability for various events using the logistic regression model formula Prob = exp(y)/(1+exp(y)) How do I compute the value of y for the formula with interaction variables ? what does c:d represent given a dataset ? Regards, Balavelan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Thinking about using two y-scales on your plot?
Allen S. Rout wrote: ... I've got a series of graphs I generate in R to illustrate backup activity. These are scaled, primarily, in bytes. But different activities have different costs per byte, and I augment the bytes scale with dollars. http://docs.osg.ufl.edu/tsm/current/ext/UFEXCH-MBX01.AD.UFL.EDU-all.html So the blue series corresponds to tbe blue dollars scale (and bytes) and the green and red points correspond to the green scale (and bytes). Am I being naughty? Hi Allen, If misleading someone is naughty, I guess we've all done that... The difficulty you have is the visual impression that the storage charges are higher than the transfer charges. You are trying to squeeze three dimensions (daily amount of information, daily cost of transfer and daily cost of storage) into two. Because the values are actually plotted against the first dimension and the cost scales are tacked on to that, only in the case where both of the latter dimensions coincide would the visual metric correspond to the numbers. Then you wouldn't have a problem, of course. I would tend to make the visual comparison on the money dimension and add supplementary axes showing the amount of information for each aspect of the cost. Here's a rough example: # estimate the first two weeks of data ufexch-data.frame(transGb= c(450,700,200,550,550,545,550,550,195,180,555,555,550,550), storGb=c(900,1000,1020,1025,1030,1035,1045,1045, 1050,1050,1050,1055,1060,1065)) # roughly calculate dollars per day ufexch$transdd-ufexch$transGb/2 ufexch$stordd-ufexch$storGb/ par(mar=c(5,4,4,4)) plot(c(ufexch$transdd,NA,0.2),xlim=c(1,14),ylim=c(0.2,400),col=green, pch=2,log=y,main=Cost of backups,ylab=Dollars/day,xlab=Date, axes=FALSE,type=b) points(ufexch$stordd,col=blue,pch=1,type=b) box() axis(2) axis(1,at=1:14,labels=paste(1:14,Feb,sep=/)) axis(4,at=c(100,300,500),labels=c(500,800,1000),col=green) axis(4,at=c(0.5,0.9,1.4),labels=c(500,900,1400),col=blue) mtext(Gigabytes,4,2) legend(6,20,c(Storage,Transfer),pch=1:2,col=c(4,3)) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Thinking about using two y-scales on your plot?
thegeologician wrote: ... very often time-series plots of some values are given rather to show the temporal correlation of these, than to show the actual numerical values! The same applies for plots of some sample values over distance (eg. element concentration over a sample or investigation area). In this case one is more interested in whether some values change simultaneously, than what the actual values at every point are. In the mentioned plot (see link below), the temporal evolution of the mean temperature and of the precipitation over a year is the important information. If temporal correlation is what you are interested in, then why not plot that? If you also care about the evolution of temperature and precipitation, then these can be plotted on individual graphs, to give three graphs in total, each with a common x-axis (time), and each showing one variable of interest on its y-axis. This way the problems of multiple y-axes are avoided. - Regards, Richie. Mathematical Sciences Unit HSL -- View this message in context: http://www.nabble.com/Thinking-about-using-two-y-scales-on-your-plot--tp16290293p16467373.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reloading java classes with rJava
Dear R people, I have recently started using rJava to interact with Java code that I have written and am finding it a very useful bridge. Thanks! I have now run into a problem I can't figure out. If I edit and recompile my java source code I would like to force rJava to reload the modified java class. However even doing: .jinit(force.init=TRUE) To restart the jvm doesn't seem to do this. Is this possible without restarting R? Many thanks, Greg. rJava 0.5-1 from CRAN platform i386-apple-darwin8.10.1 arch i386 os darwin8.10.1 system i386, darwin8.10.1 status major 2 minor 6.2 year 2008 month 02 day08 svn rev44383 language R version.string R version 2.6.2 (2008-02-08) -- Gregory Jefferis, PhD and: Research Fellow Department of Zoology St John's College University of Cambridge Cambridge Downing Street CB2 1TP Cambridge, CB2 3EJ United Kingdom Lab Tel: +44 (0)1223 336683 Office: +44 (0)1223 339899 Lab Fax: +44 (0)1223 336676 [EMAIL PROTECTED] http://www.zoo.cam.ac.uk/zoostaff/jefferis.html http://www.neuroscience.cam.ac.uk/directory/profile.php?gsxej2 http://flybrain.stanford.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to ask for *fixed* number of distributions under parameterized Gaussian mixture model.
Dear R users: I am wondering how to ask for *fixed* number of distributions under parameterized Gaussian mixture model. I know that em() and some related functions can predict the parameterized Gaussian mixture model. However, there seems no parameter to decide number of distributions to be mixed (if we known the value in advance). That is, assume I know the (mixed)data is from 3 different distributions. The output, however, may indicate that number of distributions that form the data is 4. How can I assign number of distributions is 3 in advance? Thanks a lot for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data.frame or list
Dear R list, I'm having difficulties in choosing between a list or a data.frame, or an array for the storage and manipulation of my data (example follows). I've been using the three for different purposes but I would rather like to know which is more adapted to what task. Here is the data I'm currently working on: 200 observations, each observation being a vector of length 1000 depending on several factors (polarisation, pitch, dose, size) x - seq(1,100,length = 1000) observations - matrix( rnorm(200*1000), ncol = 200) # factors polarisation - rep(c(0,90), each = 100, length = 200) pitch - rep(1:5, length = 200) dose - rep(1:2, each =100, length = 200) size - rep(letters[1:6], each =100, length = 200) my.data - list(x = x, observations = as.data.frame(observations), polarisation = factor(polarisation), pitch = factor(pitch), dose = factor(dose), size = factor(size)) I would like to be able to manipulate the data in observations using the factors as opposed to column indices. For instance, I could plot all the observations corresponding to polarisation == 90 and pitch == 1, like in, with(my.data , matplot(x, subset((pitch == 1) (polarisation == 90), observations, type=l ))) which doesn't work, so I've had to use, with(my.data , matplot(x, observations[,(pitch == 1) (polarisation == 90)], type=l )) Is this a sensible way to store and manipulate this sort of data? Is there anything more appropriate I've overlooked using data.frames only? Many thanks, baptiste _ Baptiste Auguié Physics Department University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag http://projects.ex.ac.uk/atto __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overdispersion in count data
At 17:03 02/04/2008, Wade Wall wrote: Hi all, I have count data (number of flowering individuals plus total number of individuals) across 24 sites and 3 treatments (time since last burn). Following recommendations in the R Book, I used a glm with the model y~ burn, with y being two columns (flowering, not flowering) and burn the time (category) since burn. However, the residual deviance is roughly 10 times the number of degrees of freedom, and using the quasibinomial distribution doesn't change this. Any suggestions as to why the quasibinomial distribution doesn't change the residual deviance and how I should proceed. I know that this level of residual deviance is unacceptable, but not sure is transformations are in order. You have received much helpful advice from Gavin and Achim and others but I wonder whether they are answering the quaestion in your title rather than in your post. Are you doing something like fit - glm(cbind(flower, notflower) ~ burn, family = binomial) You might find it helpful to read the relevant section in MASS (see quasibinomial in the index) or in some other text. Needless to say that I am at the outer limits of my statistical knowledge. Thanks for any help, Wade Wall [[alternative HTML version deleted]] Michael Dewey http://www.aghmed.fsnet.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lapack error in Design:::ols
Hi, I'm trying to use Frank Harrell's Design:::ols function to do regression of y (numeric) on the interaction of two factors (x1 and x2), but Lapack throws an error: library(Design) ... load(url(http://www.csse.unimelb.edu.au/~gabraham/x;)) ols(y ~ x1 * x2, data=x) Error in chol2inv(fit$qr$qr) : 'size' cannot exceed nrow(x) = 20 traceback() 6: .Call(La_chol2inv, x, size, PACKAGE = base) 5: chol2inv(fit$qr$qr) 4: ols(y ~ x1 * x2, data = x) 3: eval.with.vis(expr, envir, enclos) 2: eval.with.vis(ei, envir) 1: source(t.R, echo = TRUE) Any ideas? Thanks, Gad sessionInfo() R version 2.6.2 (2008-02-08) x86_64-pc-linux-gnu locale: LC_CTYPE=en_AU.UTF-8;LC_NUMERIC=C;LC_TIME=en_AU.UTF-8;LC_COLLATE=en_AU.UTF-8; LC_MONETARY=en_AU.UTF-8;LC_MESSAGES=en_AU.UTF-8;LC_PAPER=en_AU.UTF-8;LC_NAME=C; LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_AU.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Design_2.1-1 survival_2.34 Hmisc_3.4-3 loaded via a namespace (and not attached): [1] cluster_1.11.9 grid_2.6.2 lattice_0.17-4 rcompgen_0.1-17 -- Gad Abraham Dept. CSSE and NICTA The University of Melbourne Parkville 3010, Victoria, Australia email: [EMAIL PROTECTED] web: http://www.csse.unimelb.edu.au/~gabraham __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why is text being rasterized with text()
Thank you for looking at my code. You are absolutely right. I'm mortified that I didn't see the text() tucked inside the for loop. Once I took it out of the loop, the text() works fine. Thanks, Andrew On Thu, Apr 3, 2008 at 1:52 AM, Prof Brian Ripley [EMAIL PROTECTED] wrote: We haven't been told your OS or the graphics device or how you viewed PDF output. But all text on screen devices is rasterized -- they are raster devices. (That includes your pdf viewer.) You are writing the label 9 times in the same place. I suspect what you are seeing is a viewer artifact of doing so -- the antialiasing isn't working for you. On Thu, 3 Apr 2008, Andrew Yee wrote: Here's a question: I noticed that when I tried to create this simple graph of rectangles and text, R appears to generate text that is rasterized (this is seen both on the monitor and when the output is directed to a pdf file). Any thoughts? value.seq - c(4,as.character(seq(from=4,to=10)),11) frame() par(usr=c(0,10,0,10) ) for (r in 1:9) { rect(r,3,r+1,3.75, border = NA, col=heat.colors(9)[r]) #this text does not appear to be rasterized text(r+0.5,2.75, value.seq[r], cex=0.5) #this text appears to be rasterized text(5.5,2.25,expression (log2), cex=0.5) } Thanks, Andrew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overdispersion in count data
That is exactly how I am writing it. Glm works fine, but as I stated the residual deviance is much greater (10x) than the degrees of freedom. I want to take a look at using the negative binomial distribution, but I can't get glm.nb to work. I get the message Error: (subscript) logical subscript too long. I have used traceback() and it seems to be in the glm.fitter function, but as I say I am at the limit of my abilities here. Wade On Thu, Apr 3, 2008 at 7:23 AM, Michael Dewey [EMAIL PROTECTED] wrote: At 17:03 02/04/2008, Wade Wall wrote: Hi all, I have count data (number of flowering individuals plus total number of individuals) across 24 sites and 3 treatments (time since last burn). Following recommendations in the R Book, I used a glm with the model y~ burn, with y being two columns (flowering, not flowering) and burn the time (category) since burn. However, the residual deviance is roughly 10 times the number of degrees of freedom, and using the quasibinomial distribution doesn't change this. Any suggestions as to why the quasibinomial distribution doesn't change the residual deviance and how I should proceed. I know that this level of residual deviance is unacceptable, but not sure is transformations are in order. You have received much helpful advice from Gavin and Achim and others but I wonder whether they are answering the quaestion in your title rather than in your post. Are you doing something like fit - glm(cbind(flower, notflower) ~ burn, family = binomial) You might find it helpful to read the relevant section in MASS (see quasibinomial in the index) or in some other text. Needless to say that I am at the outer limits of my statistical knowledge. Thanks for any help, Wade Wall [[alternative HTML version deleted]] Michael Dewey http://www.aghmed.fsnet.co.uk [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lapack error in Design:::ols
This is not an LAPACK error, but an R function chol2inv complaining that it is being misused. The problem is that you have a vastly overparametrized model (using recover()) Browse[1] dim(X) [1] 20 143 and ols() does not detect that, whereas lm() would. I'd suggest ols() is to blame here. On Thu, 3 Apr 2008, Gad Abraham wrote: Hi, I'm trying to use Frank Harrell's Design:::ols function to do regression of y (numeric) on the interaction of two factors (x1 and x2), but Lapack throws an error: library(Design) ... load(url(http://www.csse.unimelb.edu.au/~gabraham/x;)) ols(y ~ x1 * x2, data=x) Error in chol2inv(fit$qr$qr) : 'size' cannot exceed nrow(x) = 20 traceback() 6: .Call(La_chol2inv, x, size, PACKAGE = base) 5: chol2inv(fit$qr$qr) 4: ols(y ~ x1 * x2, data = x) 3: eval.with.vis(expr, envir, enclos) 2: eval.with.vis(ei, envir) 1: source(t.R, echo = TRUE) Any ideas? Thanks, Gad sessionInfo() R version 2.6.2 (2008-02-08) x86_64-pc-linux-gnu locale: LC_CTYPE=en_AU.UTF-8;LC_NUMERIC=C;LC_TIME=en_AU.UTF-8;LC_COLLATE=en_AU.UTF-8; LC_MONETARY=en_AU.UTF-8;LC_MESSAGES=en_AU.UTF-8;LC_PAPER=en_AU.UTF-8;LC_NAME=C; LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_AU.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Design_2.1-1 survival_2.34 Hmisc_3.4-3 loaded via a namespace (and not attached): [1] cluster_1.11.9 grid_2.6.2 lattice_0.17-4 rcompgen_0.1-17 -- Gad Abraham Dept. CSSE and NICTA The University of Melbourne Parkville 3010, Victoria, Australia email: [EMAIL PROTECTED] web: http://www.csse.unimelb.edu.au/~gabraham __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overdispersion in count data
At 12:54 03/04/2008, Wade Wall wrote: That is exactly how I am writing it. Glm works fine, but as I stated the residual deviance is much greater (10x) than the degrees of freedom. I want to take a look at using the negative binomial distribution, but I can't get glm.nb to work. I get the message Error: (subscript) logical subscript too long. I have used traceback() and it seems to be in the glm.fitter function, but as I say I am at the limit of my abilities here. For Poisson models and for the negative binomial you have a single outcome, a count. For the binomial you can have two columns of counts of successes and failures (there are other ways of arranging your data). I think you might want to try the beta-binomial which is available I think in aod. However I still think reading the relevant section of MASS first would be a good idea (or some equivalent text). Wade On Thu, Apr 3, 2008 at 7:23 AM, Michael Dewey mailto:[EMAIL PROTECTED][EMAIL PROTECTED] wrote: At 17:03 02/04/2008, Wade Wall wrote: Hi all, I have count data (number of flowering individuals plus total number of individuals) across 24 sites and 3 treatments (time since last burn). Following recommendations in the R Book, I used a glm with the model y~ burn, with y being two columns (flowering, not flowering) and burn the time (category) since burn. However, the residual deviance is roughly 10 times the number of degrees of freedom, and using the quasibinomial distribution doesn't change this. Any suggestions as to why the quasibinomial distribution doesn't change the residual deviance and how I should proceed. I know that this level of residual deviance is unacceptable, but not sure is transformations are in order. You have received much helpful advice from Gavin and Achim and others but I wonder whether they are answering the quaestion in your title rather than in your post. Are you doing something like fit - glm(cbind(flower, notflower) ~ burn, family = binomial) You might find it helpful to read the relevant section in MASS (see quasibinomial in the index) or in some other text. Needless to say that I am at the outer limits of my statistical knowledge. Thanks for any help, Wade Wall [[alternative HTML version deleted]] Michael Dewey http://www.aghmed.fsnet.co.ukhttp://www.aghmed.fsnet.co.uk Michael Dewey http://www.aghmed.fsnet.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overdispersion in count data
On Thu, 3 Apr 2008, Wade Wall wrote: That is exactly how I am writing it. Glm works fine, but as I stated the residual deviance is much greater (10x) than the degrees of freedom. I want to take a look at using the negative binomial distribution, but I can't get glm.nb to work. I get the message Error: (subscript) logical subscript too long. I have used traceback() and it seems to be in the glm.fitter function, but as I say I am at the limit of my abilities here. But we can only help you debug this if we have a reproducible example. Wade On Thu, Apr 3, 2008 at 7:23 AM, Michael Dewey [EMAIL PROTECTED] wrote: At 17:03 02/04/2008, Wade Wall wrote: Hi all, I have count data (number of flowering individuals plus total number of individuals) across 24 sites and 3 treatments (time since last burn). Following recommendations in the R Book, I used a glm with the model y~ burn, with y being two columns (flowering, not flowering) and burn the time (category) since burn. However, the residual deviance is roughly 10 times the number of degrees of freedom, and using the quasibinomial distribution doesn't change this. Any suggestions as to why the quasibinomial distribution doesn't change the residual deviance and how I should proceed. I know that this level of residual deviance is unacceptable, but not sure is transformations are in order. You have received much helpful advice from Gavin and Achim and others but I wonder whether they are answering the quaestion in your title rather than in your post. Are you doing something like fit - glm(cbind(flower, notflower) ~ burn, family = binomial) You might find it helpful to read the relevant section in MASS (see quasibinomial in the index) or in some other text. Needless to say that I am at the outer limits of my statistical knowledge. Thanks for any help, Wade Wall [[alternative HTML version deleted]] Michael Dewey http://www.aghmed.fsnet.co.uk [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overdispersion in count data
On Thu, 2008-04-03 at 01:24 +, David Winsemius wrote: Wade Wall [EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: Thanks for the recommendations, insights. I tried using glm.nb, but it didn't seem to like my data. I received the message (subscript) logical subscript too long. I am using the same dataframe as my previous glm. Do you know if I need to put the data in a different format? I was wondering about your data layout. You said you had the flower/no- flower data in two different columns. That is not the way I usually offer data to glm(). I would have imagined that log(burn_time) would have been an offset. It might help if you at least offered the audience a sample of ten rows, the results of str() for the data.frame, and the call to the glm function. David, Wade, You can supply a two-column matrix to glm() with families (quasi)binomial as the two columns represent successes and failures respectively; see ?glm and Details section. However this is not possible with a (quasi)poisson family or with glm.nb, which is one reason why Wade might have been getting the error. My fault here - I didn't grep exactly what Wade had written. Even if this isn't causing the error, Wade won't be fitting the model he thought he was with a two column response in glm.nb. In a similar vein to David's suggestion, could one not use offset but on the total number of flowers sampled in each location, e.g. offset(log(totalPlants)) in the formula where totalPlants is the variable containing the total number of plants encountered. You need to include this in the formula for glm.nb as that function does not have an argument 'offset'. Wade's y is then just a vector of counts of flowering individuals. In this way one would account for differences in the number of flowers encountered between sites. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overdispersion in count data
Alright, I feel stupid now. That was the problem. For glm you can use both successes and failures, while with the negative binomial it is simply a count. That is why I was getting the subscript too long message. I understand generalized linear models, but I haven't worked with negative binomial distribution models, so I will read up on it before I try to use it. RTFM, right? I will read about using betabinomial for overdispersed data. Thanks for your help, Wade On Thu, Apr 3, 2008 at 8:30 AM, Michael Dewey [EMAIL PROTECTED] wrote: At 12:54 03/04/2008, Wade Wall wrote: That is exactly how I am writing it. Glm works fine, but as I stated the residual deviance is much greater (10x) than the degrees of freedom. I want to take a look at using the negative binomial distribution, but I can't get glm.nb to work. I get the message Error: (subscript) logical subscript too long. I have used traceback() and it seems to be in the glm.fitter function, but as I say I am at the limit of my abilities here. For Poisson models and for the negative binomial you have a single outcome, a count. For the binomial you can have two columns of counts of successes and failures (there are other ways of arranging your data). I think you might want to try the beta-binomial which is available I think in aod. However I still think reading the relevant section of MASS first would be a good idea (or some equivalent text). Wade On Thu, Apr 3, 2008 at 7:23 AM, Michael Dewey mailto: [EMAIL PROTECTED][EMAIL PROTECTED] wrote: At 17:03 02/04/2008, Wade Wall wrote: Hi all, I have count data (number of flowering individuals plus total number of individuals) across 24 sites and 3 treatments (time since last burn). Following recommendations in the R Book, I used a glm with the model y~ burn, with y being two columns (flowering, not flowering) and burn the time (category) since burn. However, the residual deviance is roughly 10 times the number of degrees of freedom, and using the quasibinomial distribution doesn't change this. Any suggestions as to why the quasibinomial distribution doesn't change the residual deviance and how I should proceed. I know that this level of residual deviance is unacceptable, but not sure is transformations are in order. You have received much helpful advice from Gavin and Achim and others but I wonder whether they are answering the quaestion in your title rather than in your post. Are you doing something like fit - glm(cbind(flower, notflower) ~ burn, family = binomial) You might find it helpful to read the relevant section in MASS (see quasibinomial in the index) or in some other text. Needless to say that I am at the outer limits of my statistical knowledge. Thanks for any help, Wade Wall [[alternative HTML version deleted]] Michael Dewey http://www.aghmed.fsnet.co.ukhttp://www.aghmed.fsnet.co.uk Michael Dewey http://www.aghmed.fsnet.co.uk [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf file specification, non-ASCII
Dear R-Listers, I am a Windows user (R 2.6.2) using the development version of sqldf to try to read a 3GB file originally stored in .sas7bdat-format. I convert it to comma-delimited ASCII format with StatTransfer before trying to import just the rows I need into R. The problem is that I get this error: f - file(hugedata.csv) DF - sqldf(select * from f where C_OPR like 'KKA2%', file.format=list(header=T, row.names=F)) Error in try({ : RS-DBI driver: (RS_sqlite_import: hugedata.csv line 1562740 expected 52 columns of data but found 19) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: f) Now, I know that my SAS-using colleagues are able to use this file with SAS, so I was wondering whether StatTransfer'ing it to the SAS XPORT format which can be read with the 'read.xport' function in the 'foreign' package would be a better approach. The problem is, I don't know how/whether I can do that at all with sqldf. I tried various ways like f - file(read.xport(hugedata.xport)) but I consistently got an error message from the sqldf command. I don't recall the exact error message, unfortunately, but can anybody tell me whether it is at all possible to read in files in non-ASCII format without having to put them in R memory? Thank you for your assistance. Peter. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge with rownames?
On 4/3/2008 10:47 AM, Mark Kimpel wrote: Can merge be tricked into merging via rownames as opposed to via contents of a particular column? I have two data.frames with overlapping, but out of order, rownames, but no column contents in common and would like to merge without cbinding the rownames to the data.frames. Mark The details section of the help page for merge suggests you can use either merge(..., by=0) or merge(..., by=row.names). ?merge -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf file specification, non-ASCII
On 4/3/2008 10:22 AM, Peter Jepsen wrote: Dear R-Listers, I am a Windows user (R 2.6.2) using the development version of sqldf to try to read a 3GB file originally stored in .sas7bdat-format. I convert it to comma-delimited ASCII format with StatTransfer before trying to import just the rows I need into R. The problem is that I get this error: f - file(hugedata.csv) DF - sqldf(select * from f where C_OPR like 'KKA2%', file.format=list(header=T, row.names=F)) Error in try({ : RS-DBI driver: (RS_sqlite_import: hugedata.csv line 1562740 expected 52 columns of data but found 19) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: f) That error message looks pretty clear: there's a problem on line 1562740. Can you look at that line and spot what the problem is? If you don't have a text editor that can handle big files, you should be able to do it with something like this: f - file(hugedata.csv, r) skip - 1562739 while (skip 1) { junk - readLines(f, 1) skip - skip - 1 } junk - readLines(f, skip) readLines(f, 1) Now, I know that my SAS-using colleagues are able to use this file with SAS, so I was wondering whether StatTransfer'ing it to the SAS XPORT format which can be read with the 'read.xport' function in the 'foreign' package would be a better approach. R can usually read CSV files without a problem. You've likely got a problem in your file on that line; you just need to figure out what it is, and fix it. (It's possible the sqldf function has a bug, but I'd suspect the file, first.) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merge with rownames?
Can merge be tricked into merging via rownames as opposed to via contents of a particular column? I have two data.frames with overlapping, but out of order, rownames, but no column contents in common and would like to merge without cbinding the rownames to the data.frames. Mark -- Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry Indiana University School of Medicine 15032 Hunter Court, Westfield, IN 46074 (317) 490-5129 Work, Mobile VoiceMail (317) 663-0513 Home (no voice mail please) ** [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with R semantics
Greetings: I'm running R2.6.2 on a WinXP DELL box with 2 gig RAM. I have created a new glm link function to be used with family = binomial. The function works (although any suggested improvements would be welcome), logit.FC - function(POD.floor = 0, POD.ceiling =1) { if (POD.floor 0 | POD.floor 1) stop (POD.floor must be between zero and one.) if (POD.ceiling 0 | POD.ceiling 1) stop (POD.ceiling must be between zero and one.) if (POD.ceiling - POD.floor difference.criterion) stop (paste(POD.ceiling-POD.floor difference must be greater than ,difference.criterion, to discourage answer-shopping., sep=)) linkfun - function(mu) { mu - qlogis( (mu - POD.floor)/(POD.ceiling - POD.floor) ) } linkinv - function(eta) { eta - POD.floor + (POD.ceiling - POD.floor)*plogis(eta) } mu.eta - function(eta) { (POD.ceiling - POD.floor)*dlogis(eta)# derivitaive of mu with respect to eta } valideta - function(eta) TRUE link - paste(logit.FC(, POD.floor, ,, POD.ceiling, ), sep = ) structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta = mu.eta, valideta = valideta, name = link), class = link-glm) } as is evidenced by the call, binomial(logit.FC(0,1)) My semantics problem is that I have the name logit.FC(0,1), constructed elsewhere using paste(), and I need to remove the quotations in order to use it, since binomial(logit.FC(0,1)) will fail. I know I am missing something obvious and would appreciate any help. Thanks. Charles Annis, P.E. mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com http://www.StatisticalEngineering.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] L-BFGS-B needs finite values of 'fn'
Paul, After looking at your objective function and the penalty, I realized that since the contribution of each term to the objective function decreases geometrically, the later terms contribute relatively less to the overall maximum. Hence the numerical estimation of those terms is much less precise compared to the dominant terms. This explains why the ratios of x[k]/x[k-1] is not close to b, for small values of b. Now let us try a larger value of b, say, b=0.8. Here are the results from optim, with BFGS. k - 1 b - 0.8 f - function(x, pen, k, b) { + n - length(x) + r - sum((b^(0:(n-1)))*log(x)) - pen*(sum(x)-k)^2 + return(r) + } gr - function(x, pen, k, b) { + n - length(x) + r - (b^(0:(n-1)))*(1/x) - 2*pen*(sum(x)-k) + return(r) + } nvar - 10 p0 - runif(nvar, 0, 20) sols - optim(p0, f, gr, method=BFGS, control=list(fnscale=-1), k=k, b=b, pen=1) sum(sols$par) [1] 1 sols$par[2:nvar] / sols$par[1:(nvar-1)] [1] 0.8038902 0.8021216 0.7974323 0.7999681 0.7983918 0.8006497 0.8013940 [8] 0.8044702 0.7879365 As you can see, things are much better! Hope this helps, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Wednesday, April 02, 2008 2:03 PM To: 'Paul Smith'; 'R-help' Subject: Re: [R] L-BFGS-B needs finite values of 'fn' Yes, that is very important. If you look at the ratios x[k]/x[k-1], they are very close to 0.3 for the first few components, and then they start slowly diverging (ratio becomes smaller than 0.3) from that. So, optim is indeed finding a correct solution to the problem that you actually posed. You could increase your penalty to get a solution that is closer to the analytical solution you are expecting. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Smith Sent: Wednesday, April 02, 2008 1:57 PM To: R-help Subject: Re: [R] L-BFGS-B needs finite values of 'fn' But let me add the following: the part - 200*(sum(x)-k)^2 of my function is a penalty. In truth, I want to maximize sum((b^(0:(n-1)))*log(x)) s.t. sum(x) = k. Paul On Wed, Apr 2, 2008 at 6:48 PM, Paul Smith [EMAIL PROTECTED] wrote: Thanks, Ravi. The analytical solution, (x_1,x_2,...,x_10), should satisfy this equality: x_t / x_(t-1) = 0.3. Unfortunately, the procedure that you suggest does not lead to a solution that satisfies such an equality. Paul On Wed, Apr 2, 2008 at 5:12 PM, Ravi Varadhan [EMAIL PROTECTED] wrote: Paul, Have you tried using BFGS without bounds? sols - optim(rep(20,nvar), f, gr, method=BFGS, control=list(fnscale=-1)) This converges to a solution, although I don't know if the converged solution is what you want. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Smith Sent: Monday, March 31, 2008 2:25 PM To: R-help Subject: Re: [R] L-BFGS-B needs finite values of 'fn' On Mon, Mar 31, 2008 at 2:57 PM, Zaihra T [EMAIL PROTECTED] wrote: try something like this before wrapping up your function else i guess u'll have to stick to Prof Brian Ripley suggestion his suggestions are usually best bet . f - function(x) { n - length(x) r - sum((b^(0:(n-1)))*log(x)) - 200*(sum(x)-k)^2 if(!is.finite(r)) r-1e+20 return(r) } have a nice day.
Re: [R] help with R semantics
Hi, Charles Annis, P.E. Charles.Annis at StatisticalEngineering.com writes: logit.FC - function(POD.floor = 0, POD.ceiling =1) { if (POD.floor 0 | POD.floor 1) stop (POD.floor must be between zero and one.) if (POD.ceiling 0 | POD.ceiling 1) stop (POD.ceiling must be between zero and one.) if (POD.ceiling - POD.floor difference.criterion) stop (paste(POD.ceiling-POD.floor difference must be greater than ,difference.criterion, to discourage answer-shopping., sep=)) linkfun - function(mu) { mu - qlogis( (mu - POD.floor)/(POD.ceiling - POD.floor) ) } linkinv - function(eta) { eta - POD.floor + (POD.ceiling - POD.floor)*plogis(eta) } mu.eta - function(eta) { (POD.ceiling - POD.floor)*dlogis(eta)# derivitaive of mu with respect to eta } valideta - function(eta) TRUE link - paste(logit.FC(, POD.floor, ,, POD.ceiling, ), sep = ) structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta = mu.eta, valideta = valideta, name = link), class = link-glm) } Thanks. Charles Annis, P.E. mailto:Charles.Annis at StatisticalEngineering.com Charles.Annis at StatisticalEngineering.com phone: 561-352-9699 eFax: 614-455-3265 http://www.StatisticalEngineering.com http://www.StatisticalEngineering.com [[alternative HTML version deleted]] I defined the following link in the psyphy package that might be along similar lines probit.lambda function (m = 2, lambda = 0) { m - as.integer(m) if (m 2) stop(m must be an integer 1) linkfun - function(mu) { mu - pmax(mu, 1/m + .Machine$double.eps) mu - pmin(mu, 1 - lambda) qnorm((m * mu - 1)/(m * (1 - lambda) - 1)) } linkinv - function(eta) { 1/m + ((m - 1)/m - lambda) * pnorm(eta) } mu.eta - function(eta) ((m - 1)/m - lambda) * dnorm(eta) valideta - function(eta) TRUE link - paste(probit.lambda(, m, ,, lambda, ), sep = ) structure(list(linkfun = linkfun, linkinv = linkinv, mu.eta = mu.eta, valideta = valideta, name = link), class = link-glm) } environment: namespace:psyphy HTH, ken __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Design package lrm summary and factors
Hello, I have question regarding the lrm function and estimating the odds ratio between different levels of a factored variable. The following code example illustrates the problem I am having. I have a data set with an outcome variable (0,1) and an input variable (A,B,C). I would like to estimate the effect of C vs B, but when I perform the summary I only get A vs B and A vs C, even though I specify that I want to see C vs B. Am I missing something, any help is appreciated. outcome-sample(rep(c(0,1),1),1000) input - sample(rep(c(A,B,C),1),1000) dset -data.frame(cbind(outcome,input)) dd - datadist(dset) options(datadist='dd') mod - lrm(outcome ~ input,data=dset,x=TRUE,y=TRUE) summary(mod,est.all=FALSE,input=c(C,B)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on spacing around plot and box in lattice
Hello, How can i increase the padding between the axis and the data region(box just containing the figure) in xyplot? An example: new - function(x){ if(x0){ return(x^2) }else{ return(x) } } x - seq(-1,1,length.out=100) y - sapply(x,new) sc=list() sc$alternating=0 sc$tck=0 xyplot(y~x,type='l', aspect=0.05, scales=sc, col='black', xlab='', ylab='' ) When plotted on quartz(Mac) device, there is very little padding between the bottom of the curve and the lower horizontal axis and between the maximum values of the curve and the upper horizontal axis. How can i increase the padding - top,bottom,left and right? Thanks in advance for your time. Regards Saptarshi Saptarshi Guha | [EMAIL PROTECTED] | http://www.stat.purdue.edu/~sguha [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 62, Issue 2
Dear all. Thanks in advance for any help. I need to analisy a long list of data time/r\fluoscence I want to make one plot with standard deviation and the average data... Could you suggest me how to make that.. pippo.csv 0 0 4.013 62.96 6.053 112.3 24.14 106.2 ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf file specification, non-ASCII
Thank you for your help, Duncan and Gabor. Yes, I found an early line feed in line 1562740, so I have corrected that error. The thing is, it takes me many, many hours to save the file, so I would like to confirm that there are no more errors further down the file. The ffe tool sounds like a perfect tool for this job, but it doesn't seem to be available for Windows. Is anybody out there aware of a similar Windows tool? Thank you again for your help. Peter. -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: 3. april 2008 17:08 To: Peter Jepsen Subject: Re: [R] sqldf file specification, non-ASCII One other thing you could try would be to run it through ffe (fast file extractor) which is a free utility that you can find via google. Use the ffe's loose argument. It can find bad lines and since its not dependent on R would give you and independent check. Regards. On Thu, Apr 3, 2008 at 10:36 AM, Gabor Grothendieck [EMAIL PROTECTED] wrote: Hi, Can you try it with the first 100 lines, say, of the data and also try reading it with read.csv to double check your arguments (note that sql args are similar but not entirely identical to read.csv) and if it still gives this error send me that 100 line file and I will look at it tonight or tomorrow. Regards. On Thu, Apr 3, 2008 at 10:22 AM, Peter Jepsen [EMAIL PROTECTED] wrote: Dear R-Listers, I am a Windows user (R 2.6.2) using the development version of sqldf to try to read a 3GB file originally stored in .sas7bdat-format. I convert it to comma-delimited ASCII format with StatTransfer before trying to import just the rows I need into R. The problem is that I get this error: f - file(hugedata.csv) DF - sqldf(select * from f where C_OPR like 'KKA2%', file.format=list(header=T, row.names=F)) Error in try({ : RS-DBI driver: (RS_sqlite_import: hugedata.csv line 1562740 expected 52 columns of data but found 19) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: f) Now, I know that my SAS-using colleagues are able to use this file with SAS, so I was wondering whether StatTransfer'ing it to the SAS XPORT format which can be read with the 'read.xport' function in the 'foreign' package would be a better approach. The problem is, I don't know how/whether I can do that at all with sqldf. I tried various ways like f - file(read.xport(hugedata.xport)) but I consistently got an error message from the sqldf command. I don't recall the exact error message, unfortunately, but can anybody tell me whether it is at all possible to read in files in non-ASCII format without having to put them in R memory? Thank you for your assistance. Peter. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf file specification, non-ASCII
The Windows version is on sourceforge. On Thu, Apr 3, 2008 at 1:29 PM, Peter Jepsen [EMAIL PROTECTED] wrote: Thank you for your help, Duncan and Gabor. Yes, I found an early line feed in line 1562740, so I have corrected that error. The thing is, it takes me many, many hours to save the file, so I would like to confirm that there are no more errors further down the file. The ffe tool sounds like a perfect tool for this job, but it doesn't seem to be available for Windows. Is anybody out there aware of a similar Windows tool? Thank you again for your help. Peter. -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: 3. april 2008 17:08 To: Peter Jepsen Subject: Re: [R] sqldf file specification, non-ASCII One other thing you could try would be to run it through ffe (fast file extractor) which is a free utility that you can find via google. Use the ffe's loose argument. It can find bad lines and since its not dependent on R would give you and independent check. Regards. On Thu, Apr 3, 2008 at 10:36 AM, Gabor Grothendieck [EMAIL PROTECTED] wrote: Hi, Can you try it with the first 100 lines, say, of the data and also try reading it with read.csv to double check your arguments (note that sql args are similar but not entirely identical to read.csv) and if it still gives this error send me that 100 line file and I will look at it tonight or tomorrow. Regards. On Thu, Apr 3, 2008 at 10:22 AM, Peter Jepsen [EMAIL PROTECTED] wrote: Dear R-Listers, I am a Windows user (R 2.6.2) using the development version of sqldf to try to read a 3GB file originally stored in .sas7bdat-format. I convert it to comma-delimited ASCII format with StatTransfer before trying to import just the rows I need into R. The problem is that I get this error: f - file(hugedata.csv) DF - sqldf(select * from f where C_OPR like 'KKA2%', file.format=list(header=T, row.names=F)) Error in try({ : RS-DBI driver: (RS_sqlite_import: hugedata.csv line 1562740 expected 52 columns of data but found 19) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such table: f) Now, I know that my SAS-using colleagues are able to use this file with SAS, so I was wondering whether StatTransfer'ing it to the SAS XPORT format which can be read with the 'read.xport' function in the 'foreign' package would be a better approach. The problem is, I don't know how/whether I can do that at all with sqldf. I tried various ways like f - file(read.xport(hugedata.xport)) but I consistently got an error message from the sqldf command. I don't recall the exact error message, unfortunately, but can anybody tell me whether it is at all possible to read in files in non-ASCII format without having to put them in R memory? Thank you for your assistance. Peter. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] prettyR 25% quartile, 75% quartile
I am using the describe function in prettyR. I would like to add the 25% 75% quartiles to the summary table how do I do this I have tried describe(x.f, num.desc=c(mean, median, sd, min, max, skewness, quantile(x.f, na.rm=T, probs=seq(0.25, 0.75)), valid.n)) help -- Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nonlinear equation
Robert Mcfadden wrote: Dear R Users, I'm trying to find function that allow me to solve one nonlinear equation. Functions that I found are good for optimization problems. How about ?uniroot -- View this message in context: http://www.nabble.com/Nonlinear-equation-tp16452868p16468039.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] iterative loop with user input?
Hello R-Users, I would like to use an iterative loop to collect user input from within a function. I'm sure that this would be some combination of for,break, and next but have not been able to get the syntax down. I would like to print some text to the screen at each step in the loop, ask the user for input which is saved in an object, and then advance the loop. Here is an example: #anchor is a file with unique ids anchor-rep(1:30) anchor-paste(anchor,uid,sep=) #codelist is where I would like to store user input codelist-NULL for(i in 1:30) { #tell the user which ID is being coded print(paste(You are coding unique ID,anchor[i],sep=: )) #Read a line from a text file: print(readLines(file=file_with_30_lines.txt,warn=F)[i]) #Ask the user for input codelist[i]-readline(paste(Select one of the following: \n \t please enter 1, 2, or 3: \n Enter Your Response HERE: ,sep=)) } Ideally, loop should work from inside a function. Any tips? Thanks in advance for your time and patience. Best, Chris Marcum Sociology/Cal(it)^2 University of California-Irvine __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What to do with this data?
First compute side-by-side boxplots for the two data sets. You will see that the PG group has one (189), maybe 2 (also, 52) extreme values whereas the PG group has none. The PG group will have a smaller median than the PB group. Means, st devs, and se's are legitimate statistics but do not have the usual (normal theory) interpretation, at least until you can account for or eliminate the extreme values. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of mika03 Sent: Thursday, April 03, 2008 2:16 PM To: r-help@r-project.org Subject: [R] What to do with this data? Hello, This is not necessarily a question about R, but more about how we should display our data in general. (Will we then use R to do that, once we know what to do ;-) I received good replies about such things in the past on this mailing list so I give it a go. Here's what we did: We showed a fairly large number of subjects search engine queries and different possible search engine responses. We assumed that users would like some our responses better than others and wanted to check this. Subjects could rate a query/response pair on a scale from 0 (very bad response) to 10 (very good response). Here are all the judgments we received for one particular class of response to queries which we thought users would like: Predicted-Good-0, 4 Predicted-Good-1, 1 Predicted-Good-2, 11 Predicted-Good-3, 8 Predicted-Good-4, 25 Predicted-Good-5, 12 Predicted-Good-6, 21 Predicted-Good-7, 25 Predicted-Good-8, 30 Predicted-Good-9, 52 Predicted-Good-10, 189 And here are all the judgments we received for one particular class of response to queries which we thought users would NOT like: Predicted-Bad-0, 34 Predicted-Bad-1, 23 Predicted-Bad-2, 45 Predicted-Bad-3, 60 Predicted-Bad-4, 42 Predicted-Bad-5, 50 Predicted-Bad-6, 21 Predicted-Bad-7, 20 Predicted-Bad-8, 25 Predicted-Bad-9, 19 Predicted-Bad-10, 39 Here's a small table listing number of observations, mean, standard deviation and standard error: Type, N, Mean, StDev, StErr Predicted-Good, 378, 8.21693121693122, 2.47110906286224, 0.12710013550711 Predicted-Bad, 378, 4.5978835978836, 3.02059872953413, 0.155362834286119 The question we have are: a) It doesn't seem like our data follows a standard distribution. Therefore is it okay to calculate mean, standard deviation and standard error at all? b) We initially created a figure plotting the mean and a bar around it indicating standard deviation. Then somebody who knows more about statistics told us we should display the mean and error bars around it to depict a 95% Confidence Interval, mean +/- 1.96*SE. But if we are doing this, aren't we forgetting to mention vital parts of our data, that is that we indeed get better means for Good responses, but that the individual data points are all over the place (especially for Predicted-Bad)? We would capture this by showing standard deviation. c) And finally: What would be the best way to present this data anyway? Thanks a lot! -- View this message in context: http://www.nabble.com/What-to-do-with-this-data--tp16467948p16467948.htm l Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bandwidth estimation using bw.SJ
Majnu John wrote: When I use bw.SJ (based on Sheather Jones, 1991) in R to estimate the bandwidth for a highly skewed data, I get the following message: sample is too sparse to find TD. I played around with the parameters such as no. of bins (nb), lower, upper (range over which to minimize) to no avail. My sample size is 50,000. Can anyone tell me what this means and of some way to get around this. Any help is very much appreciated. I also get the same error message. It seems to be a function of sample size. The following example runs fine with fewer random numbers. rdat - rnorm(11) hist(rdat,breaks=100) rdpi - density(rdat, width=SJ-dpi, n=ceiling(diff(range(rdat))*100)) Erreur dans bw.SJ(x, method = dpi) : sample is too sparse to find TD Clearly density() doesn't like being turned up to 11. I didn't find a solution in R, but the follow package for python seems to handle sample sizes at least two orders of magnitude greater than density() in R :-p http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/python/Statistics/manual/index.xhtml Good luck Dave -- View this message in context: http://www.nabble.com/bandwidth-estimation-using-bw.SJ-tp16284293p16470097.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] L-BFGS-B needs finite values of 'fn'
Thanks, Ravi, for your insight. My initial experiments were conducted with b = 0.7, and my procedure (similar to yours) worked reasonably well. Then I wanted to check the quality of my procedure with lower b's, and the weak result motivated my initial post here. I have meanwhile got a reasonably accurate solution with R through the DEoptim package. The number of iterations was 5 and the code the following: library(DEoptim) k - 1 b - 0.3 f - function(x) { n - length(x) if (sum(x) k) r - Inf else r - -sum((b^(0:(n-1)))*log(x)) return(r) } nvar - 10 (sols - DEoptim(f, lower=rep(0.1,nvar), upper=rep(k,nvar), control=list(itermax=5))) summary(sols) The result was: * summary of DEoptim object * best member : 6999.892 2099.949 630.0255 189.0694 56.85699 16.95149 5.12811 1.527 0.46193 0.13798 best value: -11.91101 after : 50001 iterations FUN evaluated : 2500050 times * I am still struggling to find a fast way of getting the solution to my problem at least as accurate as the one returned by DEoptim. With so many iterations, DEoptim takes longer than 15 minutes to complete the job. Paul On Thu, Apr 3, 2008 at 4:38 PM, Ravi Varadhan [EMAIL PROTECTED] wrote: Paul, After looking at your objective function and the penalty, I realized that since the contribution of each term to the objective function decreases geometrically, the later terms contribute relatively less to the overall maximum. Hence the numerical estimation of those terms is much less precise compared to the dominant terms. This explains why the ratios of x[k]/x[k-1] is not close to b, for small values of b. Now let us try a larger value of b, say, b=0.8. Here are the results from optim, with BFGS. k - 1 b - 0.8 f - function(x, pen, k, b) { + n - length(x) + r - sum((b^(0:(n-1)))*log(x)) - pen*(sum(x)-k)^2 + return(r) + } gr - function(x, pen, k, b) { + n - length(x) + r - (b^(0:(n-1)))*(1/x) - 2*pen*(sum(x)-k) + return(r) + } nvar - 10 p0 - runif(nvar, 0, 20) sols - optim(p0, f, gr, method=BFGS, control=list(fnscale=-1), k=k, b=b, pen=1) sum(sols$par) [1] 1 sols$par[2:nvar] / sols$par[1:(nvar-1)] [1] 0.8038902 0.8021216 0.7974323 0.7999681 0.7983918 0.8006497 0.8013940 [8] 0.8044702 0.7879365 As you can see, things are much better! Hope this helps, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Wednesday, April 02, 2008 2:03 PM To: 'Paul Smith'; 'R-help' Subject: Re: [R] L-BFGS-B needs finite values of 'fn' Yes, that is very important. If you look at the ratios x[k]/x[k-1], they are very close to 0.3 for the first few components, and then they start slowly diverging (ratio becomes smaller than 0.3) from that. So, optim is indeed finding a correct solution to the problem that you actually posed. You could increase your penalty to get a solution that is closer to the analytical solution you are expecting. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Smith Sent: Wednesday, April 02, 2008 1:57 PM To: R-help Subject: Re: [R] L-BFGS-B needs finite values of 'fn' But let me add the following: the part - 200*(sum(x)-k)^2 of my function is a penalty. In truth, I want to maximize sum((b^(0:(n-1)))*log(x)) s.t. sum(x) = k. Paul On Wed, Apr 2, 2008 at 6:48 PM, Paul Smith [EMAIL PROTECTED] wrote: Thanks, Ravi. The analytical solution, (x_1,x_2,...,x_10), should satisfy this equality: x_t / x_(t-1) = 0.3. Unfortunately, the procedure that you suggest does not lead to a solution that satisfies such an equality. Paul On Wed, Apr 2, 2008 at 5:12 PM, Ravi Varadhan [EMAIL PROTECTED] wrote: Paul, Have you tried using BFGS
[R] coding for categorical variables with unequal observations
Hi, I am doing multiple regression, and have several X variables that are categorical. I read that I can use dummy or contrast codes for that, but are there any special rules when there're unequal #observations in each groups (4 females vs 7 males in a gender variable)? Also, can R generate these codes for me? THanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] convert data frame values
Hello: How can I interchange symbols for numeric values in a data frame. test f s t 1 a 1 -1 2 b 1 -3 3 c -1 1 say I have test d.f . I want to make flip number that are positive to negative and negative to positive only for numerics in column 's' my desired result: new.test f s t 1 a -1 -1 2 b -1 -3 3 c 1 1 thanks Adrian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coding for categorical variables with unequal observations
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tanya Yatsunenko Sent: Thursday, April 03, 2008 1:55 PM To: r-help@r-project.org Subject: [R] coding for categorical variables with unequal observations Hi, I am doing multiple regression, and have several X variables that are categorical. I read that I can use dummy or contrast codes for that, but are there any special rules when there're unequal #observations in each groups (4 females vs 7 males in a gender variable)? Also, can R generate these codes for me? THanks. You don't need to do anything special, and yes you can just let SAS do it for you. For most of the regression PROCs you can put your categorical variables in a CLASS statement. Depending on which procedure you are using, you may be able to specify whether you want effects or dummy coding, and which level of the categorical variable should be the comparison level. It is also possible to use PROC GLMMOD to create your design variables to be fed into other PROCs. Other approaches are possible as well. If you provide more detail on what analyses you plan to undertake, someone may be able to provide more specific advice. Hope this is helpful, Dan Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coding for categorical variables with unequal observations
Ignore the previous post. I completely spaced out on which mailing list I was reading and thought this was a SAS question. My apologies, I'll just crawl back into my hole. :-) Obviously not helpful, Dan Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nordlund, Dan (DSHS/RDA) Sent: Thursday, April 03, 2008 2:45 PM To: [EMAIL PROTECTED]; r-help@r-project.org Subject: Re: [R] coding for categorical variables with unequal observations -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tanya Yatsunenko Sent: Thursday, April 03, 2008 1:55 PM To: r-help@r-project.org Subject: [R] coding for categorical variables with unequal observations Hi, I am doing multiple regression, and have several X variables that are categorical. I read that I can use dummy or contrast codes for that, but are there any special rules when there're unequal #observations in each groups (4 females vs 7 males in a gender variable)? Also, can R generate these codes for me? THanks. You don't need to do anything special, and yes you can just let SAS do it for you. For most of the regression PROCs you can put your categorical variables in a CLASS statement. Depending on which procedure you are using, you may be able to specify whether you want effects or dummy coding, and which level of the categorical variable should be the comparison level. It is also possible to use PROC GLMMOD to create your design variables to be fed into other PROCs. Other approaches are possible as well. If you provide more detail on what analyses you plan to undertake, someone may be able to provide more specific advice. Hope this is helpful, Dan Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] iterative loop with user input?
Is this what you want to do: x.f - function(){ + x - list() + for (i in 1:10){ + z - readline(paste(Parameter , i, : , sep='')) + # just store the input in a list with the same key + x[[z]] - z + } + x + } x.f() Parameter 1: 12 Parameter 2: 2 Parameter 3: asdf Parameter 4: 33 Parameter 5: scs Parameter 6: sdaf2 Parameter 7: 234 Parameter 8: gfd Parameter 9: 123 Parameter 10: aa $`12` [1] 12 $`2` [1] 2 $asdf [1] asdf On Thu, Apr 3, 2008 at 1:18 PM, Christopher Marcum [EMAIL PROTECTED] wrote: Hello R-Users, I would like to use an iterative loop to collect user input from within a function. I'm sure that this would be some combination of for,break, and next but have not been able to get the syntax down. I would like to print some text to the screen at each step in the loop, ask the user for input which is saved in an object, and then advance the loop. Here is an example: #anchor is a file with unique ids anchor-rep(1:30) anchor-paste(anchor,uid,sep=) #codelist is where I would like to store user input codelist-NULL for(i in 1:30) { #tell the user which ID is being coded print(paste(You are coding unique ID,anchor[i],sep=: )) #Read a line from a text file: print(readLines(file=file_with_30_lines.txt,warn=F)[i]) #Ask the user for input codelist[i]-readline(paste(Select one of the following: \n \t please enter 1, 2, or 3: \n Enter Your Response HERE: ,sep=)) } Ideally, loop should work from inside a function. Any tips? Thanks in advance for your time and patience. Best, Chris Marcum Sociology/Cal(it)^2 University of California-Irvine __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert data frame values
Adrian If test[,2] is numeric then test[,2] - -test[,2] should be all you need. If it isn't numeric you'll need to convert it first; e.g., test[,2] - -as.numeric(as.character(test[,2])) which can, of course, be converted back to the original class. HTH ... Peter Alspach -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adrian Johnson Sent: Friday, 4 April 2008 10:58 a.m. To: r-help Subject: [R] convert data frame values Hello: How can I interchange symbols for numeric values in a data frame. test f s t 1 a 1 -1 2 b 1 -3 3 c -1 1 say I have test d.f . I want to make flip number that are positive to negative and negative to positive only for numerics in column 's' my desired result: new.test f s t 1 a -1 -1 2 b -1 -3 3 c 1 1 thanks Adrian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are privileged and/or confidential to the named recipient and are not to be used by any other person and/or organisation. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to ask for *fixed* number of distributions under parameterized Gaussian mixture model.
Hi, Chen, I don't know how you are doing it. however, per my limited knowledge, it is easy with flexmix package. On Thu, Apr 3, 2008 at 6:26 AM, Hung-Hsuan Chen (Sean) [EMAIL PROTECTED] wrote: Dear R users: I am wondering how to ask for *fixed* number of distributions under parameterized Gaussian mixture model. I know that em() and some related functions can predict the parameterized Gaussian mixture model. However, there seems no parameter to decide number of distributions to be mixed (if we known the value in advance). That is, assume I know the (mixed)data is from 3 different distributions. The output, however, may indicate that number of distributions that form the data is 4. How can I assign number of distributions is 3 in advance? Thanks a lot for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- === WenSui Liu ChoicePoint Precision Marketing Phone: 678-893-9457 Email : [EMAIL PROTECTED] Blog : statcompute.spaces.live.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] newbie subset question
I want to look at all the records making up a row and column of a crosstab, but I'm not getting it right. I'm trying to use subset() but my selection ((prod_act==other) || (attr_act==other)) gives my no records. See the second table below. Getting just the row does work, as seen in the third table. Why is this failing me? .Table - xtabs(~prod_act+attr_act, data=Dataset) ; .Table ; remove(.Table) attr_act prod_act dk/rf home other school shop soc/rec university work dk/rf 52 0 0 0 0 0 0 0 home 0 12380 2826552 12041054 2709 15842 other 0 0 703 162 48 42 227 school 0 0 7187 4 11 0 8 shop 0 029 1 466 17 730 soc/rec0 039 122 191 840 university 0 067 040 38247 119 work 0 0 215 495 76107 1581 D2 - subset(Dataset, ((prod_act==other) || (attr_act==other)) , select=direction:attr_mta_taz ) .Table - xtabs(~prod_act+attr_act, data=D2) ; .Table ; remove(.Table) attr_act prod_act dk/rf home other school shop soc/rec university work dk/rf 00 0 00 0 00 home 00 0 00 0 00 other 00 0 00 0 00 school 00 0 00 0 00 shop 00 0 00 0 00 soc/rec00 0 00 0 00 university 00 0 00 0 00 work 00 0 00 0 00 D3 - subset(Dataset, (prod_act==other) , select=direction:attr_mta_taz ) .Table - xtabs(~prod_act+attr_act, data=D3) ; .Table ; remove(.Table) attr_act prod_act dk/rf home other school shop soc/rec university work dk/rf 00 0 00 0 00 home 00 0 00 0 00 other 00 703 1 62 48 42 227 school 00 0 00 0 00 shop 00 0 00 0 00 soc/rec00 0 00 0 00 university 00 0 00 0 00 work 00 0 00 0 00 Robert Farley Metro 1 Gateway Plaza Mail Stop 99-23-7 Los Angeles, CA 90012-2952 Voice: (213)922-2532 Fax:(213)922-2868 www.Metro.net [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coding for categorical variables with unequal observations
Also, since I just started to use R, I have trouble generating and understanding some of the codes, especially choosing the correct ones. Thanks! tanya On Thu, Apr 3, 2008 at 3:54 PM, Tanya Yatsunenko [EMAIL PROTECTED] wrote: Hi, I am doing multiple regression, and have several X variables that are categorical. I read that I can use dummy or contrast codes for that, but are there any special rules when there're unequal #observations in each groups (4 females vs 7 males in a gender variable)? Also, can R generate these codes for me? THanks. -- Tanya [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] newbie subset question
Use | instead of ||. On Thu, Apr 3, 2008 at 6:27 PM, Farley, Robert [EMAIL PROTECTED] wrote: I want to look at all the records making up a row and column of a crosstab, but I'm not getting it right. I'm trying to use subset() but my selection ((prod_act==other) || (attr_act==other)) gives my no records. See the second table below. Getting just the row does work, as seen in the third table. Why is this failing me? .Table - xtabs(~prod_act+attr_act, data=Dataset) ; .Table ; remove(.Table) attr_act prod_act dk/rf home other school shop soc/rec university work dk/rf 52 0 0 0 0 0 0 0 home 0 12380 2826552 12041054 2709 15842 other 0 0 703 162 48 42 227 school 0 0 7187 4 11 0 8 shop 0 029 1 466 17 730 soc/rec0 039 122 191 840 university 0 067 040 38247 119 work 0 0 215 495 76107 1581 D2 - subset(Dataset, ((prod_act==other) || (attr_act==other)) , select=direction:attr_mta_taz ) .Table - xtabs(~prod_act+attr_act, data=D2) ; .Table ; remove(.Table) attr_act prod_act dk/rf home other school shop soc/rec university work dk/rf 00 0 00 0 00 home 00 0 00 0 00 other 00 0 00 0 00 school 00 0 00 0 00 shop 00 0 00 0 00 soc/rec00 0 00 0 00 university 00 0 00 0 00 work 00 0 00 0 00 D3 - subset(Dataset, (prod_act==other) , select=direction:attr_mta_taz ) .Table - xtabs(~prod_act+attr_act, data=D3) ; .Table ; remove(.Table) attr_act prod_act dk/rf home other school shop soc/rec university work dk/rf 00 0 00 0 00 home 00 0 00 0 00 other 00 703 1 62 48 42 227 school 00 0 00 0 00 shop 00 0 00 0 00 soc/rec00 0 00 0 00 university 00 0 00 0 00 work 00 0 00 0 00 Robert Farley Metro 1 Gateway Plaza Mail Stop 99-23-7 Los Angeles, CA 90012-2952 Voice: (213)922-2532 Fax:(213)922-2868 www.Metro.net [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] ggplot2 - version 0.6
ggplot2 ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and avoid bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics. Find out more at http://had.co.nz/ggplot2, and check out the over 400 examples of ggplot in use. ggplot 0.6 The two big changes in this release are improved documentation and legends: * all major ggplot2 components now have their own built in documentation, so that (e.g.) ?stat_smooth or ?geom_point now give you useful information * the legend code is now considerably more sophisticated and will attempt to merge together legends for the same variable * also, legends are drawn based on the geoms used (instead of the scales used, as previously) so should match the plot much better (e.g. for geom_smooth, geom_boxplot, geom_vline, geom_abline, geom_pointrange). These features are new, so there are likely to be a few bugs that I haven't discovered. Please me know if you do find any. Other additions and corrections * coord_equal: should now work correctly in all situations * coord_polar: add start and direction parameters, giving more control over the layout of the polar coords * coord_polar: added my favourite pie chart example * coord_trans now deals with groups appropriately, at the cost of decreased speed * geom_abline, geom_vline, geom_hline: should now behave better in a wider variety of settings * geom_boxplot: deals with continuous x-axis and grouping much better * geom_boxplot: now has it's own legend which actually looks like a boxplot * geom_boxplot: reports if missing values removed * geom_crossbar: the middle line is now display thicker than the other lines, controlled by the parameter fatten (thanks to Heike Hofmann for the suggestion) * geom_density: fix scale adjustment bug in geom_density * geom_line, geom_text: all size measurements (now lines and text as well) are measured in mm, lines/paths default to paths 0.5mm wide * geom_rug: new to add marginal rug plots * geom_smooth: added example showing how to use geom_smooth with your own models * geom_smooth: fixed bug where if se=FALSE x axis always includes 0 * geom_vline, geom_hline: yet another rewrite which should make them more powerful and less error prone. * ggsave reports width and height of saved image * position_stack: fixed bug when data was empty * qplot: allow qplot to use computed aesthetics too * scale_continuous: tweaks to minor breaks to make appearance better on wider range of coordinate systems * scale_discrete: all discrete scales now have labels argument which you can use to override the factor levels * scale_discrete: now works correctly with character vectors * scale_size: changed default range to [0.5, 3] to better reflect new sizing decisions * scale_size: legends resize to avoid overlaps * scale_x_continuous, scale_y_continuous: new convenience functions xlim and ylim (and zlim) that make it even easier to adjust the limits of the x, y, and z axes * stat_bin, geom_area: fixed bug in combination of stat_bin and geom_area that made it difficult to draw frequency polygons * stat_bin: fixed bug which resulted in increased counts when the x axis was a categorical variable with a single level (thanks to Bob Muenchen for pointing this out!) * stat_bin: no longer incorrectly warns that binwidth is unspecified when breaks are set * stat_bin: now takes origin argument to manually specify origin of first bin (default is round_any(min(range), bin_width, floor)) * stat_boxplot, stat_contour, stat_density2d, stat_qq, stat_density: na.rm parameter added to the following statistics (thanks to Leena Choi for suggesting this) * stat_function: new, makes it easy to superimpose a function on the plot * stat_qq: axes flipped to agree with base R * stat_qq: now uses sample aesthetic to select variable for summary * stat_quantile: updated to work with latest version of quantreg * stat_spoke: new, to make it possible to use geom_segment parameterised by angle and radius (thanks to Jiho for the suggestion) * stat_summary: better documentation * stat_summary: convenient auto wrapping of simple summary functions Miscellaneous changes: * it's now easy to change the default scales (and their arguments) with the set_default_scale function, see ?set_default_scale for more details (thanks to Bob Muenchen for the suggestion) * new order aesthetic which controls the order in which elements are plotted * min and max are now scaled the same way as y * functions are silently dropped (e.g. aes(colour=col)) * scales do not try and map variables that don't exist (fixes some rather obscure bugs) *
[R] markov switching model
excuse me, i need your help. introduce my name is harsugi, i have just graduated from my study at Gadjah mada University, Indonesia. I have several Question about using R. 1. I try to solve markov switcing model with R, but i dont know if there any packages discuss about it, can you tell me if any? 2. when i read some papers, i saw that most of them use markov chain montecarlo methods. i don't know how to implement this procedure in R. Could you tell me how? and what package must i use? please send me some responses/ answers to this e-mail. i will be very glad to receive and read your answers. thank you Harsugi - [[elided Yahoo spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] looking for a CDF of bivariate noncentral Chisquare
Hi, I would like to know if there is a program written in R to get the CDF (cumulative distribution function) of a bivariate non-central chi-square distribution. Hope someone will reply. Thank you, Rossita M Yunus [EMAIL PROTECTED] This email (including any attached files) is confidentia...{{dropped:19}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coding for categorical variables with unequal observations
I think you need to do some background reading. R does this automatically for you, and offers many options for how it is done. (That you have asked suggests you have missed that.) The account by Bill Venables in chapter 6 of MASS (the book, see the R FAQ) is regarded as the most comprehensive available. On Thu, 3 Apr 2008, Tanya Yatsunenko wrote: Also, since I just started to use R, I have trouble generating and understanding some of the codes, especially choosing the correct ones. Thanks! tanya On Thu, Apr 3, 2008 at 3:54 PM, Tanya Yatsunenko [EMAIL PROTECTED] wrote: Hi, I am doing multiple regression, and have several X variables that are categorical. I read that I can use dummy or contrast codes for that, but are there any special rules when there're unequal #observations in each groups (4 females vs 7 males in a gender variable)? Also, can R generate these codes for me? THanks. -- Tanya [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.