Re: [R] What solve() does?
On Wed, Sep 1, 2010 at 5:36 AM, Petar Milin pmi...@ff.uns.ac.rs wrote: Hello! Can anyone explain me what solve() function does: Gaussian elimination or iterative, numeric solve? In addition, I would need both the Gaussian elimination and iterative solution for the course. Are the two built in R? Thanks! PM Hello, Petar: I think you are assuming that solve uses an elementary linear algebra paper and pencil procedure, but I don't think it does. In a digital computer, those things are not precise, and I think the folks here will even say you shouldn't use solve to get an inverse, but I can't remember all of the details. To see how solve works ... Let me show you a trick I just learned. Read ?solve notice it is a generic method, meaning it does not actually do the calculations for you. Rather, there are specific implementations for different types of cases. To find the implementations, run methods(solve) I get: methods(solve) [1] solve.default solve.qr Then if you want to read HOW solve does what it does (which I think was your question), run this: solve.default or solve.qr In that code, you will see the chosen procedure depends on the linear algebra libraries you make available. I'm no expert on the details, but it appears QR decomposition is the preferred method. You can read about that online or in numerical algebra books. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Please explain do.call in this context, or critique to stack this list faster
I've been doing some consulting with students who seem to come to R from SAS. They are usually pre-occupied with do loops and it is tough to persuade them to trust R lists rather than keeping 100s of named matrices floating around. Often it happens that there is a list with lots of matrices or data frames in it and we need to stack those together. I thought it would be a simple thing, but it turns out there are several ways to get it done, and in this case, the most elegant way using do.call is not the fastest, but it does appear to be the least prone to programmer error. I have been staring at ?do.call for quite a while and I have to admit that I just need some more explanations in order to interpret it. I can't really get why this does work do.call( rbind, mylist) but it does not work to do sapply ( mylist, rbind). Anyway, here's the self contained working example that compares the speed of various approaches. If you send yet more ways to do this, I will add them on and then post the result to my Working Example collection. ## stackMerge.R ## Paul Johnson pauljohn at ku.edu ## 2010-09-02 ## rbind is neat,but how to do it to a lot of ## data frames? ## Here is a test case df1 - data.frame(x=rnorm(100),y=rnorm(100)) df2 - data.frame(x=rnorm(100),y=rnorm(100)) df3 - data.frame(x=rnorm(100),y=rnorm(100)) df4 - data.frame(x=rnorm(100),y=rnorm(100)) mylist - list(df1, df2, df3, df4) ## Usually we have done a stupid ## loop to get this done resultDF - mylist[[1]] for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]]) ## My intuition was that this should work: ## lapply( mylist, rbind ) ## but no! It just makes a new list ## This obliterates the columns ## unlist( mylist ) ## I got this idea from code in the ## complete function in the mice package ## It uses brute force to allocate a big matrix of 0's and ## then it places the individual data frames into that matrix. m - 4 nr - nrow(df1) nc - ncol(df1) dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ## I searched a long time for an answer that looked better. ## This website is helpful: ## http://stackoverflow.com/questions/tagged/r ## I started to type in the question and 3 plausible answers ## popped up before I could finish. ## The terse answer is: shortAnswer - do.call(rbind,mylist) ## That's the right answer, see: shortAnswer == dataComplete ## But I don't understand why it works. ## More importantly, I don't know if it is fastest, or best. ## It is certainly less error prone than dataComplete ## First, make a bigger test case and use system.time to evaluate phony - function(i){ data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000)) } mylist - lapply(1:1000, phony) ### First, try the terse way system.time( shortAnswer - do.call(rbind, mylist) ) ### Second, try the complete way: m - 1000 nr - nrow(df1) nc - ncol(df1) system.time( dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) ) system.time( for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ) ## On my Thinkpad T62 dual core, the shortAnswer approach takes about ## three times as long: ## system.time( bestAnswer - do.call(rbind,mylist) ) ##user system elapsed ## 14.270 1.170 15.433 ## system.time( ## +dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) ## + ) ##user system elapsed ## 0.000 0.000 0.006 ## system.time( ## + for (j in 1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ## + ) ##user system elapsed ## 4.940 0.050 4.989 ## That makes the do.call way look slow, and I said hey, ## our stupid for loop at the beginning may not be so bad. ## Wrong. It is a disaster. Check this out: ## resultDF - phony(1) ## system.time( ## + for (i in 2:1000) resultDF - rbind(resultDF, mylist[[i]]) ## +) ##user system elapsed ## 159.740 4.150 163.996 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Save data as .pdf or .JPG
On Wed, Sep 1, 2010 at 7:56 AM, khush bioinfo.kh...@gmail.com wrote: Hi all , I have following script to plot some data. plot( c(1,1100), c(0,15), type='n', xlab='', ylab='', ylim=c(0.1,25) , las=2) axis (1, at = seq(0,1100,50), las =2) axis (2, at = seq(0,25,1), las =2) When I source(script.R), I got the image on interface but I do not want to use screenshot option to save the image? How can save the output to .pdf or .jpg format? Thank you Khushwant Hi! This is one of the things that is difficult for newcomers. I've written down a pretty thorough answer: http://pj.freefaculty.org/R/Rtips.html#5.2 pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coxph and ordinal variables?
run it with factor() instead of ordered(). You don't want the orthogonal polynomial contrasts that result from ordered if you need to compare against Stata. I attach an R program that I wrote to explore ordered factors a while agol I believe this will clear everything up if you study the examples. pj On Wed, Sep 8, 2010 at 5:43 PM, Min-Han Tan minhan.scie...@gmail.com wrote: Dear R-help members, Apologies - I am posting on behalf of a colleague, who is a little puzzled as STATA and R seem to be yielding different survival estimates for the same dataset when treating a variable as ordinal. Ordered() is used to represent an ordinal variable) I understand that R's coxph (by default) uses the Efron approximation, whereas STATA uses (by default) the Breslow. but we did compare using the same approximations. I am wondering if this is a result of how coxph manages an ordered factor? Essentially, this is a survival dataset using tumor grade (1, 2, 3 and 4) as the risk factor. This is more of an 'ordinal' variable, rather than a continuous variable. For the same data set of 399 patients, when treating the vector of tumor grade as a continuous variable (range of 1 to 4), testing the Efron and the Breslow approximations yield the same result in both R and STATA. However, when Hist_Grade_4 grp is converted into an ordered factor using ordered(), and the same scripts are applied, rather different results are obtained, relative to the STATA output. This is tested across the different approximations, with consistent results. The comparison using Efron approximation and ordinal data is is below. Your advice is very much appreciated! Min-Han Apologies below for the slightly malaligned output. STATA output . xi:stcox i.Hist_Grade_4grp, efr i.Hist_Grade_~p _IHist_Grad_1-4 (naturally coded; _IHist_Grad_1 omitted) failure _d: FFR_censor analysis time _t: FFR_month Iteration 0: log likelihood = -1133.369 Iteration 1: log likelihood = -1129.4686 Iteration 2: log likelihood = -1129.3196 Iteration 3: log likelihood = -1129.3191 Refining estimates: Iteration 0: log likelihood = -1129.3191 Cox regression -- Efron method for ties No. of subjects = 399 Number of obs = 399 No. of failures = 218 Time at risk = 9004.484606 LR chi2(3) = 8.10 Log likelihood = -1129.3191 Prob chi2 = 0.0440 -- _t | Haz. Ratio Std. Err. z P|z| [95% Conf. Interval] -+ _IHist_Gra~2 | 1.408166 .3166876 1.52 0.128 .9062001 2.188183 _IHist_Gra~3 | 1.69506 .3886792 2.30 0.021 1.081443 2.656847 _IHist_Gra~4 | 2.540278 .9997843 2.37 0.018 1.17455 5.49403 R Output using summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp, method=c(breslow))) summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp, method=c(exact))) summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp, method=c(efron))) n=399 (21 observations deleted due to missingness) coef exp(coef) se(coef) z Pr(|z|) Hist_Grade_4grp.L 0.66685 1.94809 0.26644 2.503 0.0123 * Hist_Grade_4grp.Q 0.03113 1.03162 0.20842 0.149 0.8813 Hist_Grade_4grp.C 0.08407 1.08771 0.13233 0.635 0.5252 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 Hist_Grade_4grp.L 1.948 0.5133 1.1556 3.284 Hist_Grade_4grp.Q 1.032 0.9693 0.6857 1.552 Hist_Grade_4grp.C 1.088 0.9194 0.8392 1.410 Rsquare= 0.02 (max possible= 0.997 ) Likelihood ratio test= 8.1 on 3 df, p=0.044 Wald test = 8.02 on 3 df, p=0.0455 Score (logrank) test = 8.2 on 3 df, p=0.04202 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fast / dependable way to stack together data frames from a list
Hi, everybody: I asked about this in r-help last week and promised a summary of answers. Special thanks to the folks that helped me understand do.call and pointed me toward plyr. We face this problem all the time. A procedure generates a list of data frames. How to stack them together? The short answer is that the plyr package's rbind.fill method is probably the fastest method that is not prone to trouble and does not require much user caution. result - rbind.fill(mylist) A slower alternative that also works is result - do.call(rbind, mylist) That is always available in R and it works well enough, even though it is not quite as fast. Both of these are much faster than a loop that repeatedly applies rbind. Truly blazing speed can be found if we convert this into matrices, but that is not possible if the list actually contains data frames. I've run this quite a few times, and the relative speed of the different approaches has never differed much. If you run this, I hope you will feel smarter, as I do! :) ## stackListItems.R ## Paul Johnson pauljohn at ku.edu ## 2010-09-07 ## Here is a test case df1 - data.frame(x=rnorm(100),y=rnorm(100)) df2 - data.frame(x=rnorm(100),y=rnorm(100)) df3 - data.frame(x=rnorm(100),y=rnorm(100)) df4 - data.frame(x=rnorm(100),y=rnorm(100)) mylist - list(df1, df2, df3, df4) ## Here's the way we have done it. We understand this, ## we believe the result, it is easy to remember. It is ## also horribly slow for a long list. resultDF - mylist[[1]] for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]]) ## It works better to just call rbind once, as in: resultDF2 - rbind( mylist[[1]],mylist[[2]],mylist[[3]],mylist[[4]]) ## That is faster because it calls rbind only once. ## But who wants to do all of that typing? How tiresome. ## Thanks to Erik Iverson in r-help, I understand that resultDF3 - do.call(rbind, mylist) ## is doing the EXACT same thing. ## Erik explained that do.call( rbind, mylist) ## is *constructing* a function call from the list of arguments. ## It is shorthand for rbind(mylist[[1]], mylist[[2]], mylist[[3]]) ## assuming mylist has 3 elements. ## Check the result: all.equal( resultDF2, resultDF3) ## You often see people claim it is fast to allocate all ## of the required space in one shot and then fill it in. ## I got this algorithm from code in the ## complete function in the mice package. ## It allocates a big matrix of 0's and ## then it places the individual data frames into that matrix. m - 4 nr - nrow(df1) nc - ncol(df1) resultDF4 - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) for (j in 1:m) resultDF4[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ## This is a bit error prone for my taste. If the data frames have ## different numbers of rows, some major code surgery will be needed. ## ## Dennis Murphy pointed out the plyr package, by Hadley Wickham. ## Dennis said ldply() in the plyr package. The following is the same ## idea as do.call(rbind, l), only faster. library(plyr) resultDF5 - ldply(mylist, rbind) all.equal(resultDF, resultDF5) ## Plyr author Hadley Wickham followed up with I think all you want here is rbind.fill: resultDF6 - rbind.fill(mylist) all.equal(resultDF, resultDF6) ## Gabor Grothendieck noted that if the elements in mylist were matrices, this would all work faster. mylist2 - lapply(mylist, as.matrix) matrixDoCall - do.call(rbind, mylist2) all.equal(as.data.frame(matrixDoCall), resultDF) ## Gabor also showed a better way than 'system.time' to find out how ## long this takes on average using the rbenchmark package. Awesome! # library(rbenchmark) # benchmark( #+ df = do.call(rbind, mylist), #+ mat = do.call(rbind, L), #+ order = relative, replications = 250 #+ ) ## To see the potentially HUGE impact of these changes, we need to ## make a bigger test case. I just used system.time to evaluate, but ## if this involved a close call, I'd use rbenchmark. phony - function(i){ data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000)) } mylist - lapply(1:1000, phony) ### First, try my usual way resultDF - mylist[[1]] system.time( for (i in 2:1000) resultDF - rbind(resultDF, mylist[[i]]) ) ## wow, that's slow: ## user system elapsed ## 168.040 4.770 173.028 ### Now do.call method: system.time( resultDF3 - do.call(rbind, mylist) ) all.equal(resultDF, resultDF3) ## Faster! Takes one-twelfth as long ## user system elapsed ## 14.640.85 15.49 ### Third, my adaptation of the complete function in the mice ### package: m - length(mylist) nr - nrow(mylist[[1]]) nc - ncol(mylist[[1]]) system.time( resultDF4 - as.data.frame(matrix(0, nrow = nr*m, ncol = nc)) ) colnames(resultDF4) - colnames(mylist[[1]]) system.time( for (j in 1:m) resultDF4[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]] ) all.equal(resultDF, resultDF4) ##Disappointingly slow on the big case: # user system elapsed # 80.400 3.970 84.573 ### That took much longer than I expected, Gabor's ### hint
Re: [R] coxph and ordinal variables?
Hi, everybody On Wed, Sep 8, 2010 at 5:43 PM, Min-Han Tan minhan.scie...@gmail.com wrote: David said my R code text attachment got rejected by the mailing list. Pooh. I don't think that's nice. I don't see anything in the posting guide about a limit on text attachments. Well, if you are still trying to understand what 'orthogonal polynomial' means, I suggest you run the following through. I thought it was an enlightening experience. # Paul Johnson paulj...@ku.edu Nov. 16, 2005 # Ordinal predictors with a small number of possible values # Here is R code and commentary about ordinal predictors. # Please let me know if you have insights to explain this more clearly. set.seed(19) sampleSize - 100 subgroupSize - sampleSize/5 # One of those 5 point opinion questions: Strongly Disagree... # Strongly agree with values assigned 1,3,5,7,9 surveyQuestion - c(rep(1,subgroupSize),rep(3,subgroupSize),rep(5,subgroupSize),rep(7,subgroupSize),rep(9,subgroupSize)) ### Make this an unordered factor myufac - factor(surveyQuestion) ### Study the contrasts: contrasts(myufac) ### Make an ordered factor myofac - ordered(surveyQuestion) contrasts(myofac) # another independent variable x - rnorm(sampleSize) # a random error e - rnorm(sampleSize) # First, suppose the output data is really just a # linear reflection of the surveyQuestion. It is created # by multiplying against the evenly spaced values # observed in the survey y1 - -5 + 4*surveyQuestion + 6*x + 10 * e mod0 - lm(y1~x + surveyQuestion) summary(mod0) # Question: are you making a mistake by just throwing # surveyQuestion into the analysis? You should not be # making a mistake, because you (luckily) guessed the right model # You might check by running the model with the unordered factor, # which amounts to running dummy variables mod1 - lm(y1~x + myufac) summary(mod1) # or with the ordered factor, which estimates the orthogonal # polynomials mod2 - lm(y1~x + myofac) summary(mod2) # Does either of those model appear to be better than # the original mod0? # If I did not know for sure the factor was ordered, I'd # be stuck with the treatment contrasts in mod1. And what # I would really like to know is this: are the coefficients # in mod1 stepping up in a clear, ordinal, evenly spaced pattern? # Since we know the data is created with a coefficient of 4 # we would expect that the coefficients would step up like this # myufac3 8 # myufac5 16 # myufac7 24 # myufac9 32 # See why we expect this pattern? The intercept gobbles up myufac1, # so each impact of the surveyQuestion has to be reduced by 4 units. # The impact of myufac3, which you expect might be 3*4=12, is reduced # to 3*4 - 4 = 8, and so forth. # But that does not happen with a smallish sample. You can run this # code a few times. It appears to me that a sample of more than # 10,000 is necessary to get any faithful reproduction of the steps # between values of surveyQuestion. # Question: would we mistakenly think that the uneveness observed in # summary(mod1) is evidence of the need to treat surveyQuestion as a # nominal factor, even though we know (since we created the data) that # we might as well just throw surveyQuestion in as a single variable? # How to decide? # We need a hypothesis test of the conjecture that # the coefficient estimates in mod1 fall along a line # i.e, myufac-i = (b2 * i ) - b2 # I believe that the correct test results from this command: anova(mod0, mod1, test=Chisq) # If that is significant, it means you are losing predictive # power by throwing in surveyQuestion as if it were a numerical # variable. # Now, what if we are sure that the data gathered in surveyQuestion is # really ordinal, and so we estimate mod2. # What do those parameters mean? Here I'm just reasoning/guessing # because I cannot find any complete idiot's guide to orthogonal # polynomials and their use in regression and/or R. # Take a look at the contrasts themselves # ord1 - contrasts(myofac) # ord1 # .L .Q.C ^4 # 1 -6.324555e-01 0.5345225 -3.162278e-01 0.1195229 # 3 -3.162278e-01 -0.2672612 6.324555e-01 -0.4780914 # 5 -3.287978e-17 -0.5345225 1.595204e-16 0.7171372 # 7 3.162278e-01 -0.2672612 -6.324555e-01 -0.4780914 # 9 6.324555e-01 0.5345225 3.162278e-01 0.1195229 # What does this mean? I believe: we act as though there are 4 # independent variables in the regression, L, Q, C, and ^4, and for # each respondent in the dataset, we choose a row of these numbers # to act as independent variables. A person who # answered 1 on the survey would have the input values (-.63, -.53, # -.31, 0.11) for those four variables. # This begs the question, what are L, Q, C, and ^4? ### This is tough to explain. # Background Recall from calculus that any function can be # approximated by a polynomial. Since surveyQuestion has only 5 # possible values, a polynomial of degree 4 is needed to replace it # in a regression model. It can
Re: [R] Problems with pdf device using plot glht function on multcomp library.
Hi, Kenneth It is not clear if you mean that your pdf output usually works, but it does not in this special case, or that this is a first effort with pdf. The answer might depend on which is the case case. If you are just getting started, can I refer you to some lecture notes I made about saving graphs in R? http://pj.freefaculty.org/R/SummerCamp2010/PJ-Lectures/plotting2.pdf If you cut off the file name at the end, you will see the list of the other lectures I prepared for our university's Summer Stats Camp in 2010. If pdf does usually work, Could I suggest you take this code and rearrange? Where you have this: pdf(plot1.pdf) m1-aov(Nitratos~Descripcion-1,data=Sx) vect1-table(Sx$Descripcion) K-contrMat(vect1,base=4) dnk-glht(m1,linfct=K) summary(dnk) old.par-par(no.readonly = TRUE) par(mai=c(1,2,1.25,1),mgp=c(3,1,0)) print(plot(dnk,las=1,xlab=)) print(abline(v=0,lty=2)) par(old.par) dev.off() Instead, separate the stats part from the plot part m1 - aov(Nitratos~Descripcion-1, data=Sx) vect1 - table(Sx$Descripcion) K - contrMat(vect1, base=4) dnk - glht(m1, linfct=K) summary(dnk) pdf(plot1.pdf, onefile=F, paper=special, height=6, width=6) ### old.par-par(no.readonly = TRUE) par(mai = c(1, 2, 1.25, 1), mgp = c(3,1,0)) plot(dnk, las = 1, xlab = ) abline(v = 0, lty = 2) par(old.par) dev.off() I've made changes to cut your print statements and moved your pdf command down to sandwich the plot commands. I've added some embellishment which I almost guarantee will make your pdf output more useful. (Spaces are inserted around - and after commas for readability. Check the R examples, they generally recommend that). To the R experts, it may be that the par stuff seems obvious, but to the rest of us, well, it is not completely so. You do not need to save the par values because the par setting you change is instantly forgotten when you do dev.off(). So you don't need to worry about saving the default settings and then returning them. Run par() to see for yourself after dev.off(). par's defaults will be back where they were at the beginning. You would only need to do remember my par thing if you were trying to put several graphs into a single output device, which you aren't. And I've made that clear by putting onefile=F in the pdf statement. I'd try it without the par at all if there is still trouble in the output file. I added the paper=special option in the pdf command in order to make the margins in the output more reasonable. If you don't do that, the pdf output assumes you are wanting a whole sheet of paper, so there will be some massive margins in your output. Good luck. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to find the interception point of two linear fitted model in R?
On Wed, Oct 21, 2009 at 12:09 PM, FMH kagba2...@yahoo.com wrote: Dear All, Let have 10 pair of observations, as shown below. ## x - 1:10 y - c(1,3,2,4,5,10,13,15,19,22) plot(x,y) ## Two fitted models, with ranges of [1,5] and [5,10], can be easily fitted separately by lm function as shown below: ### mod1 - lm(y[1:5] ~ x[1:5]) mod2 - lm(y[5:10] ~ x[5:10]) ### I'm looking for the interception points between these two straight lines from these fitted models, mod1 and mod2. Could someone advice me the way to do it? Thank you Fir extract the coefficients, rearrange in a linear system, use solve. m1 - as.matrix(rbind(coef(mod1), coef(mod2))) a - cbind(c(1,1), -m1[, 2]) a [,1] [,2] [1,]1 -0.90 [2,]1 -3.257143 b - m1[,1] solve(a=a,b=b) [1] 4.396364 4.551515 This seems to visually verify the result: plot(1:10,1:10, type=n) abline(mod1) abline(mod2) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] spss imports--trouble with to.data.frame
will not be allowed in factors anymore 13: In `levels-`(`*tmp*`, value = c(P, L, Kesk-Eesti, ... : duplicated levels will not be allowed in factors anymore 14: In `levels-`(`*tmp*`, value = c(Galicia, Principado de Asturias, ... : duplicated levels will not be allowed in factors anymore 15: In `levels-`(`*tmp*`, value = c(Stockholm, , Sydsverige, ... : duplicated levels will not be allowed in factors anymore str(d2$HAPPY) Factor w/ 13 levels Extremely unhappy,..: NA NA NA NA NA NA NA NA NA NA ... Oh, heck, all the values are missing!! Somehow, putting to.data.frame inside the read.spss causes a different outcome than using as.data.frame after reading in the data. The symptoms of this in R-2.9 are a little different, but the conclusion the same. Help? In case you are a student who wants to work with this data, I can share to you the large script that I have been accumulating so that you might play along. It turns out to be surprisingly difficult to recode these factor variables that have levels like none, 1, 2,...9, total. ## Paul Johnson ## November 13, 2009 ## A question arose in the lab. A student asks I want ## to compare the answers from two different editions ## of the European Social Survey. ## I will add this to Stuff Worth Knowing later, but ## I can share this tutorial to you right now. ## From this website: ## http://ess.nsd.uib.no/ess ## Download those European Social Survey Datasets into a directory. ## In a terminal, use the unzip command: ## unzip ESS3e03_2.spss.zip ## unzip ESS2e03_1.spss.zip ## Then run the following in R. library(foreign) d2 - read.spss(ESS3e03_2.por,to.data.frame=T) d2 - read.spss(ESS3e03_2.por) warnings() ### You can try to go into a data frame in one ### step, that's an option in read.spss. But ### we saw warnings, and wanted to be careful. d2 - as.data.frame(d2) d2$whichSurvey - 2 d3 - read.spss(ESS2e03_1.por) d3 - as.data.frame(d3) d3$whichSurvey - 3 namesd2 - names(d2) namesd3 - names(d3) commonNames - intersect( namesd3, namesd2) combod23 - rbind(d2[ , commonNames], d3[, commonNames]) save(combod23, file=combod23.Rda) ## Error ##Warning messages: ##1: In `[-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA, : ## invalid factor level, NAs generated ##2: In `[-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA, : ## invalid factor level, NAs generated ##3: In `[-.factor`(`*tmp*`, ri, value = c(1, 1, 1, 1, 1, 1, 1, 1, 1, : ## invalid factor level, NAs generated ## That worries me a little bit. The warnings did too. ## Inspect a few lines in the result. combod23[1:4, ] ## fix doesn't work for me, did not bother to investigate. ## fix(combod23) ##Error in edit.data.frame(get(subx, envir = parent), title = subx, ...) : ## can only handle vector and factor elements ## That means some data from hell came into this thing. ## I suspect that combod23 is OK. ## The memory use on this exercise is huge! Try to help it rm (d2) rm (d3) ## But I worry. I have 2 ways that I use to try to figure this ## out. One is to open the dataset in a clone of SPSS called ## PSPP. Actually, the executable is psppire. ## ## The other thing I do is open the same data again in ## a numeric format, and compare the 2 combined data frames ## This is also a useful exercise because it helps you ## understand what a factor is in R. dn2 - read.spss(ESS3e03_2.por, use.value.labels = F) dn2 - as.data.frame(dn2) dn2$whichSurvey - 2 dn3 - read.spss(ESS2e03_1.por, use.value.labels = F) dn3 - as.data.frame(dn3) dn3$whichSurvey - 3 ## Might be smart to compare # dn2$HAPPY[1:50] # d2$HAPPY[1:50] namesdn2 - names(dn2) namesdn3 - names(dn3) commonNNames - intersect( namesdn3, namesdn2 ) combodn23 - rbind(dn2[ , commonNNames], dn3[, commonNNames]) save(combodn23, file=combodn23.Rda) table( combod23$HAPPY, combodn23$HAPPY) ## In summary, whenever I want to use a variable from ## the combined data frame, I would probably compare ## against combodn23 just to feel safe. ## Note, after when you come back to work on this project again, you ## might as well just reload the saved copies of combod23 and ## combodn23. ## load(combod23.Rda) ## load(combodn23.Rda) ## That will put you at the current spot, no need to redo the merge ## Now, about recoding. If you just want numerical ## data, you might consider using combodn23. ## But if you want some factors and some numberical ## variables, then you might need to recode to reclaim ## values. ## HAPPY turns out to be an interesting example of a ## PAIN IN THE ASS because in SPSS, it is scored from ## 0 to 10, but they give value labels only for scores ## 1= Extremely unhappy ## and ## 10= Extremely happy ## ## And the SPSS column has no labels for values 1-9. ## If SPSS gave NO labels at all, then this would come ## into R as a numeric variable. BUT, because there are ## 2 levels named, then R makes a factor out of it. ## When R turns it into a factor, you ## end up with a nutty
Re: [R] gregmisc library (Mandriva)
On Sat, Oct 3, 2009 at 9:18 AM, chi ball c...@hotmail.it wrote: Hi, I'm not able to find a rpm of gregmisc library (2.0.0) for Linux Mandriva 2008 Spring. Any suggestion? Thanks If you don't find an up to date RPM, either you have to learn how to build an RPM or just install the package yourself. You can install the package yourself in a number of ways, I think R FAQ outlines it. To update and download a whole bunch of packages, I use a script. It should be easy for you to see how this works. I scan the system to update what packages there are, and then install a lot of others if they are not installed yet. as root run R CMD BATCH R_installFaves-2.R or inside R as root you could type source(R_installFaves-2.R) On Ubuntu, if you do this as root it installes the packages into /usr/local/lib/R, but on Fedora it installs them under /usr/lib/R. I do not know where they will go with Mandriva. I used to run the script to get ALL packages, but when the CRAN list accumulated more than 600 packages, my systems just spent all day building packages. So I had to narrow my sites. I'm looking at administering a cluster computer on which I'll need to make RPMs for many packages, and so I'm in the same boat as you are if you are wanting RPMs. You could check back with me in about a month to find out if I have packages for you. pj I think gregmisc is a bundle, those are deprecated. Instead, you install gdata, gmodels, and so forth. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotmath vector problem; full program enclosed
Here's another example of my plotmath whipping boy, the Normal distribution. A colleague asks for a Normal plotted above a series of axes that represent various other distributions (T, etc). I want to use vectors of equations in plotmath to do this, but have run into trouble. Now I've isolated the problem down to a relatively small piece of working example code (below). If you would please run this, I think you will see the problem. When plotmath meets one vector of expressions, it converts all but one to math, so in the figure output i get, in LaTeX speak b1 $\mu-1.0 \sigma$$\mu$ All values except the first come out correctly. This happens only when I try to use bquote or substitute to get variables to fill in where the 1.96, 1.0, and so forth should be. In the figure output, you should see a second axis where all of the symbols are resolved correctly. As usual, thanks in advance for your help, sorry if I've made an obvious mistake or overlooked a manual. ### Filename: plotMathProblem.R ### Paul Johnson July 5, 2010 ### email me paulj...@ku.edu sigma - 10.0 mu - 4.0 myx - seq( mu - 3.5*sigma, mu+ 3.5*sigma, length.out=500) myDensity - dnorm(myx,mean=mu,sd=sigma) ### xpd needed to allow writing outside strict box of graph ### Need big bottom margin to add several x axes par(xpd=TRUE, ps=10, mar=c(18,2,2,2)) plot(myx, myDensity, type=l, xlab=, ylab=Probability Density , main=myTitle1, axes=FALSE) axis(2, pos= mu - 3.6*sigma) axis(1, pos=0) lines(c(myx[1],myx[length(myx)]),c(0,0)) ### closes off axes addInteriorLine - function(x, m, sd){ for (i in 1:(length(x))){ lines( c(x[i],x[i]), c(0, dnorm(x[i],m=m,sd=sd)), lty= 14, lwd=.2) } } dividers - c(qnorm(0.025), -1, 0, 1, qnorm(0.975)) addInteriorLine(mu+sigma*dividers, mu,sigma) # bquote creates an expression that text plotters can use t1 - bquote( mu== .(mu)) mtext(bquote( mu == .(mu)), 1, at=mu, line=-1) addInteriorLabel - function(pos1, pos2, m, s){ area - abs(100*( pnorm(m+pos1*s,m,s)-pnorm(m+pos2*s, m,s))) mid - m+0.5*(pos1+pos2)*s text(mid, 0.5*dnorm(mid,m,s),label=paste(round(area,2),%)) } addInteriorLabel(dividers[1],dividers[2], mu, sigma) addInteriorLabel(dividers[2],dividers[3], mu, sigma) addInteriorLabel(dividers[3],dividers[4], mu, sigma) addInteriorLabel(dividers[4],dividers[5], mu, sigma) ### Following is problem point: axis will ### end up with correct labels, except for first point, ### where we end up with b1 instead of mu - 1.96*sigma. b1 - substitute( mu - d*sigma, list(d=round(dividers[1],2)) ) b2 - substitute( mu - sigma ) b3 - substitute( mu ) b4 - substitute( mu + sigma ) b5 - substitute( mu + d*sigma, list(d=round(dividers[5],2)) ) ## plot(-20:50,-20:50,type=n,axes=F) axis(1, line=4,at=mu+dividers*sigma, labels=c(expression(b1),b2,b3,b4,b5), padj=-1) ### This gets right result but have to hard code the dividers b1 - expression( mu - 1.96*sigma ) b2 - expression( mu - sigma ) b3 - expression( mu ) b4 - expression( mu + sigma ) b5 - expression( mu + 1.96*sigma ) axis(1, line=8,at=mu+dividers*sigma, labels=c(b1,b2,b3,b4,b5), padj=-1) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to save summary(lm) and anova (lm) in format?
There are R packages that can make nice R regression tables in LaTeX documents. I've used memisc and its good, there is also apsrtable and the old standby xtable. Also I use my own function outreg, but that's just a 'not invented here' attitude. Your problem is that you need this to go into Word, in which case I think a reasonable strategy is to create html output in R and then in Word you can use paste special HTML and it will bring in the html as a Word table. I recently made a presentation about this, you might scan down to the end where I have the html example for the poor-pitiful Word users of the world: http://pj.freefaculty.org/SummerCamp2010/regression2.pdf Look down around slide 75 pj On Fri, Jul 2, 2010 at 12:34 PM, Yi liuyi.fe...@gmail.com wrote: Hi, folks, I would like to copy the output of summary(lm) and anova (lm) in R to my word file. But the output will be a mess if I just copy after I call summary and anova. # x=rnorm(10) y=rnorm(10,mean=3) lm=lm(y~x) summary(lm) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -1.278567 -0.312017 0.001938 0.297578 1.310113 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 2.5221 0.2272 11.103 3.87e-06 *** x -0.5988 0.2731 -2.192 0.0597 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.7181 on 8 degrees of freedom Multiple R-squared: 0.3753, Adjusted R-squared: 0.2972 F-statistic: 4.807 on 1 and 8 DF, p-value: 0.0597 How can I get the exact ouput as shown in R but not as the above? Thanks. Yi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotmath vector problem; full program enclosed
On Tue, Jul 6, 2010 at 12:41 PM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 06/07/2010 10:54 AM, Paul Johnson wrote: Here's another example of my plotmath whipping boy, the Normal distribution. You want as.expression(b1), not expression(b1). The latter means the expression consisting of the symbol b1. The former means take the object stored in b1, and convert it to an expression.. It's not perfect, because you'll end up with mu - -1.96sigma (i.e. two minus signs), but it's closer than what you had. Duncan Murdoch Hi, Duncan and David Thanks for looking. I suspect from the comment you did not run the code. The expression examples I give do work fine already. But I have to explicitly put in values like 1.96 to make them work. I'm trying to avid that with substitute, which does work for b2, b3, b4, b5, all but b1. Why just one? I'm uploading a picture of it so you can see for yourself: http://pj.freefaculty.org/R/plotmathwrong.pdf please look in the middle axis. Why does only b1 not work, but the rest do? -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotmath vector problem; full program enclosed
On Wed, Jul 7, 2010 at 5:41 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: You want as.expression(b1), not expression(b1). The latter means the expression consisting of the symbol b1. The former means take the object stored in b1, and convert it to an expression.. Thanks to Duncan and Allen, who pointed out that I was not even reading my own code carefully. I apologize for trying your patience. Before I stop swinging at this one, I still want to bother everybody about one thing, which really was the original question, but I did not know the words to ask it. The full code below is a working example that works, but I don't understand why. Focus on these two commands that produce 2 axes. Both produce the desired output, but, as far as I can see, they should not! 1: axis(1, line=6, at=mu+dividers*sigma, labels=as.expression(c(b1,b2,b3,b4,b5), padj=-1)) 2: axis(1, line=9, at=mu+dividers*sigma, labels=c(as.expression(b1),b2,b3,b4,b5), padj=-1) This second one shouldn't work, I think. It has as.expression on only the first element, and yet they all come out right. Is there a spill over effect? My original question should not have asked why b1 does not print correctly when I do this: axis(1, line=9, at=mu+dividers*sigma, labels=c(expression(b1),b2,b3,b4,b5), padj=-1) but the correct question should have been why do b2, b3, b4 , and b5 get processed properly into plot math even though they are not expressions?? ??? pj ### Filename: plotMathProblem.R ### Paul Johnson July 7, 2010 ### email me paulj...@ku.edu sigma - 10.0 mu - 4.0 myx - seq( mu - 3.5*sigma, mu+ 3.5*sigma, length.out=500) myDensity - dnorm(myx,mean=mu,sd=sigma) ### xpd needed to allow writing outside strict box of graph ### Need big bottom margin to add several x axes par(xpd=TRUE, ps=10, mar=c(18,2,2,2)) plot(myx, myDensity, type=l, xlab=, ylab=Probability Density , main=myTitle1, axes=FALSE) axis(2, pos= mu - 3.6*sigma) axis(1, pos=0) lines(c(myx[1],myx[length(myx)]),c(0,0)) ### closes off axes addInteriorLine - function(x, m, sd){ for (i in 1:(length(x))){ lines( c(x[i],x[i]), c(0, dnorm(x[i],m=m,sd=sd)), lty= 14, lwd=.2) } } dividers - c(qnorm(0.025), -1, 0, 1, qnorm(0.975)) addInteriorLine(mu+sigma*dividers, mu,sigma) # bquote creates an expression that text plotters can use t1 - bquote( mu== .(mu)) mtext(bquote( mu == .(mu)), 1, at=mu, line=-1) addInteriorLabel - function(pos1, pos2, m, s){ area - abs(100*( pnorm(m+pos1*s,m,s)-pnorm(m+pos2*s, m,s))) mid - m+0.5*(pos1+pos2)*s text(mid, 0.5*dnorm(mid,m,s),label=paste(round(area,2),%)) } addInteriorLabel(dividers[1],dividers[2], mu, sigma) addInteriorLabel(dividers[2],dividers[3], mu, sigma) addInteriorLabel(dividers[3],dividers[4], mu, sigma) addInteriorLabel(dividers[4],dividers[5], mu, sigma) b1 - substitute( mu - d*sigma, list(d=round(dividers[1],2)) ) b2 - substitute( mu - sigma ) b3 - substitute( mu ) b4 - substitute( mu + sigma ) b5 - substitute( mu + d*sigma, list(d=round(dividers[5],2)) ) axis(1, line=4, at=mu+dividers*sigma, labels=c(b1,b2,b3,b4,b5), padj=-1) axis(1, line=6, at=mu+dividers*sigma, labels=as.expression(c(b1,b2,b3,b4,b5), padj=-1)) axis(1, line=9, at=mu+dividers*sigma, labels=c(as.expression(b1),b2,b3,b4,b5), padj=-1) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] permutations of a binary matrix with fixed margins
Jérôme, As a first attempt, how about the function below. It works (or not) by randomly sorting the rows and columns, then searching the table for squares with the corners = matrix(c(1,0,0,1),ncol=2) and subtracting them from 1 to give matrix(c(0,1,1,0),ncol=2) (and vice versa). Randomized matrices can be produced as a chain where each permutation is seeded with the previous one. Potential problems: 1. Does it really explore all possible permutations? Does it do it in an unbiased way? 2. Related to above: there is potential autocorrelation (although I haven't found it with my data set), so might need some dememorisation steps. 3. It's slow and dememorisation makes it slower. 4. It isn't clear to me whether it needs the added stochastic element, i.e. squares are only flipped if runif(1)0.5. In practice it seems to work without it (as far as I can tell, i.e. it isn't autocorrelated using my data set). I suspect there's a much better way of doing this, which might take the margins as an input, and therefore wouldn't be directly derived from any particular matrix structure. Paul ### # function to permute binary matrix keeping margins fixed. # the input x is the binary matrix to be permuted permute.struct- function(x) { x-x[rownames(x)[sample(1:nrow(x))],colnames(x)[sample(1:ncol(x))]] pattern-c(1,0,0,1) for(i in 1:(nrow(x)-1)) for(j in 1:(ncol(x)-1)) for(ii in (i+1):nrow(x)) for(jj in (j+1):ncol(x)) { query-c(x[c(i,ii),c(j,jj)]) if((all(query-pattern==0) || all(query==1-pattern)) runif(1)0.5) x[c(i,ii),c(j,jj)] - 1 - x[c(i,ii),c(j,jj)] } x } ### From: Mathieu Jérôme jerome.mathieu_at_snv.jussieu.fr Date: Thu 05 Apr 2007 - 13:34:01 GMT Dear R Users, How can I randomize a matrix (with contains only 0 and 1) with the constraint that margin totals (columns and rows sums) are kept constant? Some have called this swap permutation or something like that. The principle is to find bloc of 10 01 and change it for 01 10 there can be several rows or columns between these numbers. It probably exists in R, but I can't find the function. I am aware of permcont, but it works only for 2x2 matrices thanks in advance Jerome -- Jérôme Mathieu Laboratory of soil tropical ecology UPMC [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Interaction Terms versus Interaction Effects in logistic regression
I would like to know more about the output from the terms option in predict(), especially for a glm. And especially when there is an interaction effect being considered. Here's why I ask. These articles were recently brought to my attention. They claim that just about everybody who has reported an interaction coefficient in a logit or probit glm has interpreted it incorrectly. Ai, C. and E.C. Norton. 2003. Interaction Terms in Logit and Probit Models. Economics Letters 80(1):123−129. Norton, E.C., H. Wang, and C. Ai. 2004. Computing interaction effects and standard errors in logit and probit models. The Stata Journal 4(2):154−167. These articles are available here: http://www.unc.edu/~enorton/ Along with the Stata ado file that makes the calculations. It seems to me the basic point here is that an interaction changes the slope of a line, as in z z z z z xxx z z z z The predicted value changes, of course, It may go up or down, depending on whether the case considered is on the left or right. I don't see that as a unique problem for logit models. It seems to be an artifact of Euclidean geometry :) The logistic regression model does complicate the application of this model to making predictions because the positioning of a case depends on the values of all input variables, not just the one considered in the interaction. This is why I'm wishing I had a better understanding of the terms option in predict. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] as.numeric(.) returns 0
In R version 2.7.0 (2008-04-22) as.numeric(.) returns zero. as.numeric(.) [1] 0 This must be a bug. Splus and previous versions of R (= 2.6.0) return NA, as you might expect. I'm running R version 2.7.0 (2008-04-22) on Windows XP. Paul _ Paul Johnson Robertson Centre for Biostatistics University of Glasgow Glasgow G12 8QQ, UK [EMAIL PROTECTED] http://www.stats.gla.ac.uk/~paulj/index.html http://www.rcb.gla.ac.uk/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] restricted coefficient and factor in linear regression.
On Sat, Jun 14, 2008 at 7:49 AM, Oh Dong-hyun [EMAIL PROTECTED] wrote: Hi, my data set is data.frame(id, yr, y, l, e, k). I would like to estimate Lee and Schmidts (1993, OUP) model in R. My colleague wrote SAS code as follows: ** procedures for creating dummy variables are omitted ** ** di# and dt# are dummy variables for industry and time ** data a2; merge a1 a2 a; by id yr; proc sysnlin maxit=100 outest=beta2; endogenous y; exogenous l e k di1-di12 dt2-dt10; parms a0 0.94 al -0.14 ae 1.8 ak -0.9 b1 0 b2 0 b3 0 b4 0 b5 0 b6 0 b7 0 b8 0 b9 0 b10 0 b11 0 b12 0 c2 0 c3 0 c4 0 c5 0 c6 0 c7 0 c8 0 c9 0 c10 0; y=a0+al*l+ae*e+ak*k +(b1*di1+b2*di2+b3*di3+b4*di4+b5*di5+b6*di6 +b7*di7+b8*di8+b9*di9+b10*di10+b11*di11+b12*di12)* (1*dt1+c2*dt2+c3*dt3+c4*dt4+c5*dt5+c6*dt6+c7*dt7 +c8*dt8+c9*dt9+c10*dt10); title '* lee/schmidt parameter estimates *'; My R code is as follows: ## library(plm) dt - read.table(dt.dta, sep = \t, header= T) dt$id - factor(dt$id) dt$yr - factor(dt$yr) fit.model - I(log(y)) ~ I(log(l)) + I(log(e)) + yr * id re.fit.gls - pggls(fit.model, data = dt) # I've got the following error message: # Error message ### Error in dimnames(x) - dn : length of 'dimnames' [2] not equal to array extent End of Error message I would like to figure out three things. 1. How can I restrict coefficient in model? As you can see in SAS code, coefficient of dt1 is restricted to 1. 2. If it is possible to restrict coefficients, it is possible to restrict coefficients of factors? If so, how? Hello, I've not used the package plm very much. I've been reading its docs now and I don't think it is exactly what you want, since you've not described a panel data problem. Possibly you need to take this up with the plm author. If I were you, I'd go in another direction. First, fit with ordinary 'lm', just to check sanity of data. Second, get the package systemfit in which there is a nonlinear system fitting routine comparable to the SAS sysnlin that your colleague is using. Let us know how it works out. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R vs. Bugs
Hey, good topic for a thread. I've wrestled with this over the years. I think there's some user confusion about what WinBUGS does. People who did not see BUGS before WinBUGS tend not to understand this in the same way... The unique / important contributions from WinBUGS are the collection of working examples and the Doodle code generator thing. All of the rest of it can be easily replaced by open source tools that exist or could easily exist. On Sun, Jun 22, 2008 at 11:34 AM, Peter Muhlberg [EMAIL PROTECTED] wrote: I've done some looking around in R and elsewhere to answer my question on the value of R vs. Bugs for MCMC. So, for anyone who is curious, here's what I think I've found: Bugs compiles its code, which should make it much faster than a pure R program. Packages such as AMCMC run MCMC in R, potentially with a user-defined C function for the density--which should make it comparable in speed to Bugs. The packages MCMCpack (MCMCmetrop1R function) and mcmc seem designed to run w/ a density function written in R. MCMCpack does have functions that use precompiled C code from the Scythe library (which looks nice), but I see no simple way to add a C density function. AMCMC and Bugs seem to use adaptive MCMC, but the other R packages don't appear to do so, which may mean another performance reduction. Think of performance as a combination of development time and run time. Andrew Martin's MCMCpack reduces development time by giving people some pre-packaged Gibbs sampling fitters for standard models, such as logistic regression. It still uses the same iterative sampling routines under the hood as most other MCMC approaches. The only difference there is that it has routines formatted in a way that will be familiar to R users. I do not believe a simulation model conducted in MCMCpack will take a different amount of run time than a well coded, custom version of the same that is prepared in WinBUGS or any other GIBBS sampler. The big difference is that MCMCpack offers routines for familiar models, and if one wants to thrown in some random parameter here or there, then MCMCpack won't be able to handle it. The U. Chicago professors provide the package bayesm which is roughly the same kind of thing. For pre-existing model types, an MCMC model can be conducted. Students ask Why learn BUGS when I can use MCMCpack (or bayesm)? Answer: no reason, unless you want to propose a model that is not already coded up by the package. I see no way to insert my own proposal density in the R functions. JAG, a Java-based version of BUGS, apparently allows users to create If you mean to refer to Martyn Plummer's project JAGS: http://www-ice.iarc.fr/~martyn/software/jags/ JAGS is Just Another Gibbs Sampler, and it is decidedly not written in Java. It is, rather, written in C++. JAGS is proposed as a more-or-less complete drop in replacement for BUGS, so the user can write up a BUGS style model and then hand it over to JAGS for processing, the same way we used BUGS before the WinBugs came along and tried to make it a pointy-clicky experience. JAGS has versions of the classic BUGS examples, and I think it is very well done. If you want to run a Gibbs Sampling exercise in Linux, I suggest you seriously consider using JAGS. You might use WinBUGS Doodle thingie to sketch out the code for your model, but then translate it to JAGS. (I put a bit of thought into packaging an RPM for Fedora users a few months ago, but ran into a few little packaging errors that put me off of it. Now I'm running Ubuntu and packaging for that is harder, at least for me...) I notice there are some R packages that are providing pre-packaged estimators for common models through JAGS. Check witness: bayescount Bayesian analysis of count distributions with JAGS bayesmixBayesian Mixture Models with JAGS It is disappointing/frustrating to me that the source code for WinBUGS/OpenBugs is kept in secret, because here's what I'd really like. 1. Make a version of Doodle for sketching out models. As far as I can see, Doodle is the only truly uniquely valuable component in Win/Open BUGS. It helps people get started by providing a code template. 2. Create an avenue for that Doodle code to travel to JAGS. rJAGS exists as an interface between R and jags. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trouble combining plotmath, bquote, expressions
I'm using R-2.6.2 on Fedora Linux 9. I've been experimenting with plotmath. I wish it were easier to combine expressions in plotmath with values from the R program itself. There are two parameters in the following example, the mean mymean and standard deviation mystd. I am able to use bquote to write elements into the graph title like mu = mymean and R will plot the symbol mu and the value of mymean from the program. But I want to combine that result in a string along with other results. Can I combine to result like this Normal( mu = mymean , sigma = mystd) Where symbols mu and sigma are replaced with Greek and mymean and mystd are drawn from program? ### Filename: Normal1_2008.R ### Paul Johnson March 31, 2008 ### This code should be available somewhere in http://pj.freefaculty.org. If it is not ### email me [EMAIL PROTECTED] mymean - 0 mystd - 1.5 myx - seq( mymean - 3*mystd, mymean+ 3*mystd, length.out=500) myDensity - dnorm(myx,mean=mymean,sd=mystd) ## This works plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density , main=expression(mu == 0)) ## This works plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density , main=expression(sigma == 1.5)) ## This works t1 - bquote( mu== .(mymean)) plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density , main= t1 ) ## This works t2 - bquote( sigma== .(mystd)) plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density , main=t2) ## Can't figure how to combine t1 and t2 into plot title ### This fails! ### plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density , main=paste( t1,t2) ) plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density , main= t1 ) ## Can drop sigma in as margin text, though, on side 3. mtext(t2, 3) lines(myx,myDensity,lty=4, col=4) ### change line type color if you want Supposing there is a way to do this, could I submit a working example to be added to the help page for plotmath ? How could I go about that? -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotmath overstrikes in output on a Linux system
I've been testing plotmath. But I'm getting some funny output one one computer. The problem is that characters are 'jumbled' and overstrike when symbols are introduced. Sample code: mu - 440.0 sigma - 12.5 myx - seq( mu - 4*sigma, mu+ 4*sigma, length.out=500) myDensity - dnorm(myx,mean=mu,sd=sigma) # Here's one way to retrieve the values of mu and sigma and insert them into an expression t1t2 - bquote (paste(Normal(, mu== .(mu), ',', sigma== .(sigma),)) ) plot(myx, myDensity, type=l, xlab=X, ylab=Probability Density , main=t1t2) I have tested this code and it works on two desktop Fedora Linux 8 systems to make a nice figure, but on a Dell Laptop with Fedora Linux 8 and R 2.6.2, something funny happens: the characters overstrike each other. The Greek letter mu is printed several spaces to the left of the ( that it is supposed to follow. I made an itty bitty picture of the figure title to show you: http://pj.freefaculty.org/R/plotmath_problem.png I can force in spaces to re-arrange the symbols so they do not overstrike. The following looks fine in the output. t1t2 - bquote (paste(Normal ( , mu== .(mu), ' , ', sigma== .(sigma), )) ) ### Note spaces manually inserted above are needed, otherwise plotmath overlaps l of plot(myx, myDensity, type=l, xlab=X, ylab=Probability Density , main=t1t2) What do you suppose is wrong? The X configuration? -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotmath overstrikes in output on a Linux system
On Tue, Apr 8, 2008 at 4:29 AM, Prof Brian Ripley [EMAIL PROTECTED] wrote: Looks likes the the laptop is using different fonts with incorrect font metrics. This could happen because it has a different screen resolution, or one of the systems is set to use scalable fonts or it is giving metrics for one fonts and using another or BTW, this is probably nothing to do with plotmath per se -- the issue is most likely the device (presumably X11(), but you did not say) and its handling of font metric requests. It looks to me as if the widths of normal text strings (and not symbols) is the problem. I have seen something similar where there was a long-standing bug in plotmath (fixed in r44534), but this only manifested itself when text was kerned, which the X11() device in 2.6.2 does not ask for. My best guess is that the font selected is (to use the technical terms) returning glyph widths without left/right bearings. Please try 2.7.0 beta, which has a different X11() device (and the above bug fixed). That ought to solve the problem for you, but otherwise you'll need to debug what the X11 metricInfo callback is giving you. I installed R-latest from the beta directory. Following your leads, I've got some information that may help you see if this is purely an isolated X11 problem on my computer or something you might fix in R. On screen, I see this bad looking plot: http://pj.freefaculty.org/R/png-screenshot-notOK.png Using a png device to write the graph to file, either with dev.copy(png ... ) or png (file = ) plot ( ) dev.off() The saved file is GOOD: http://pj.freefaculty.org/R/png-output-OK.png http://pj.freefaculty.org/R/png_dev_copy_OK.png More details about this laptop. The display has a native resolution of 1680x1050, and IF I put the X11 display at that resolution, then the on-screen X11 display of that particular graph is fine. However, if I change the display to another value, the one I prefer is 1400x1050, and re-start, then I see the bad on-screen display. In both cases, however, I have confirmed that the DPI settings in xdpyinfo are correctly representing pixels/inches. That leads me back to Prof. Ripley's suggestion that I've got a mismatch in fonts between what X11 really wants and what it gets. The font configuration on these systems is becoming so complicated I'm completely bewildered, as Fedora has both traditional linux font server and xft fonts and pango/cairo and freetype and apparently anything else they can think of. I'm going to keep studying the R admin guide to see how to debug it. off topic Finally, something came up that I don't want to forget. It is an RPM building question for R fans. This concerns my experience with R-beta-20080407.r45159. R built installed without any apparent trouble, and I decided to try to make an RPM package of it so other Fedora people can test. I took the spec file from the existing Fedora 8 updates R package, and made a few changes to suit the new file name for R, and I built RPMS. The build proceeds without noticeable trouble, but when I try to install the RPM, I get this error $ sudo rpm -ivh R-2.7.20080407.r45159-1fc8.1.i386.rpm Password: error: Failed dependencies: perl(R::Dcf) is needed by R-2.7.20080407.r45159-1fc8.1.i386 perl(R::Logfile) is needed by R-2.7.20080407.r45159-1fc8.1.i386 perl(R::Rd) is needed by R-2.7.20080407.r45159-1fc8.1.i386 perl(R::Rdconv) is needed by R-2.7.20080407.r45159-1fc8.1.i386 perl(R::Rdtools) is needed by R-2.7.20080407.r45159-1fc8.1.i386 perl(R::Utils) is needed by R-2.7.20080407.r45159-1fc8.1.i386 perl(R::Vars) is needed by R-2.7.20080407.r45159-1fc8.1.i386 As you can see, the package does have the perl modules, but something has gone wrong in the scripting process that takes note of them: $ rpm -qilp R-2.7.20080407.r45159-1fc8.1.i386.rpm | grep perl /usr/share/R/perl /usr/share/R/perl/File /usr/share/R/perl/File/Copy /usr/share/R/perl/File/Copy/Recursive.pm /usr/share/R/perl/R /usr/share/R/perl/R/Dcf.pm /usr/share/R/perl/R/Logfile.pm /usr/share/R/perl/R/Rd.pm /usr/share/R/perl/R/Rdconv.pm /usr/share/R/perl/R/Rdlists.pm /usr/share/R/perl/R/Rdtools.pm /usr/share/R/perl/R/Utils.pm /usr/share/R/perl/R/Vars.pm /usr/share/R/perl/Text /usr/share/R/perl/Text/DelimMatch.pm /usr/share/R/perl/build-help-windows.pl /usr/share/R/perl/build-help.pl /usr/share/R/perl/massage-Examples.pl I'm contacting the RPM maintainer from redhat to see if he knows. But if you know, please let me know. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot function / par() settings
On Tue, Apr 8, 2008 at 12:47 PM, Pedro Mardones [EMAIL PROTECTED] wrote: Dear all; I'm trying to create a 2 x 3 plot (something I know like lattice can do better) using the plot function. However; I'm not sure how to make the width of the plots to be the same on each column. I guess the answer maybe obvious but I haven't been able to figure it out. I'll appreciate any suggestion. Here is the (highly inefficient) code for the first row: Dear Pedro: I played around with the layout function and I think this does what you want. Because I set the widths exactly, there is no squishing of the outer plots anymore. Try this: nf - layout(matrix(c(1,2,3,4,5,6),2,3,byrow=TRUE), widths=c(lcm(5), lcm(5), lcm(5)), heights=c(1,1), TRUE) layout.show(nf) par(mar=c(1,0,0,0)) plot(1:5, 1:5, xlab = , ylab = , xaxt = n, col = blue); axis(3, at = seq(1:5), labels = rep(, 5)) plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col = red); axis(3, at = seq(1:5), labels = seq(1:5)) plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col = red); axis(3, at = seq(1:5), labels = rep(, 5)) axis(4, at = seq(1:5), labels = rep(, 5)) plot(1:5, 1:5, xlab = , ylab = , xaxt = n, col = blue); axis(3, at = seq(1:5), labels = rep(, 5)) plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col = red); axis(3, at = seq(1:5), labels = seq(1:5)) plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col = red); axis(3, at = seq(1:5), labels = rep(, 5)) axis(4, at = seq(1:5), labels = rep(, 5)) par(mfrow = c(2, 3)) par(omi = c(0.60, 0.60, 0.60, 0.10)) par(mai = c(0.00, 0.50, 0.50, 0.00)) plot(1:5, 1:5, xlab = , ylab = , xaxt = n, col = blue); axis(3, at = seq(1:5), labels = rep(, 5)) par(mai = c(0.00, 0.00, 0.50, 0.00)) plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col = red); axis(3, at = seq(1:5), labels = seq(1:5)) par(mai = c(0.00, 0.00, 0.50, 0.50)) plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col = red); axis(3, at = seq(1:5), labels = rep(, 5)) axis(4, at = seq(1:5), labels = rep(, 5)) Thanks PM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple graphing question
On Tue, Apr 8, 2008 at 2:18 PM, stephen sefick [EMAIL PROTECTED] wrote: #copy and paste this into R f - (structure(list(TKN = c(0.103011025, 0.018633208, 0.104235702, 0.074537363, 0.138286096), RM = c(215, 198, 148, 119, 61)), .Names = c(TKN, RM), class = data.frame, row.names = 25:29)) plot(f$TKN~f$RM, type=b) I would like to reverse the X-Axis. How do I do this? Hello, Stephen: It appears you might be new in R, so let me point out a couple of things. First, this works: f - data.frame( TKN = c(0.103011025, 0.018633208, 0.104235702,0.074537363, 0.138286096), RM = c(215, 198, 148, 119, 61), row.names = 25:29) plot(TKN~RM, data=f, type=b, xlim=rev(range(f$RM))) Note that I've created your data frame in a more usual way and I've reversed the x axis in the plot by reversing the range of the X variable. I've also used the data option to plot Second, I had reversed an axis before, but I quickly learned how by typing the following command: RSiteSearch(reverse axis) That opened up the web browser and it listed many items, the second of which was this: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/66958.html thread, the title of which is How to reverse the sequence of axis Y ? Generally, if you try RSiteSearch() and don't find what you need after exploring a page or two of threads, then you can post here and ask questions without people saying go read the posting guide before posting questions. Good luck PJ -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] distance matrix as text file - how to import?
On Tue, Apr 8, 2008 at 1:50 PM, Hans-Jörg Bibiko [EMAIL PROTECTED] wrote: Dear all, I have -hopefully- a tiny problem. I was sent a text file containing a distance matrix à la: 1 2 3 4 5 6 Try this! I put your test data in text.txt and voila: mat - matrix(0, 3,3) mat[row(mat) = col(mat)] - scan(test.txt) I found this Idea after RSiteSearch(scan triangular) led to this item as the very first link: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/22369.html PJ -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with NA data when computing standard error
On Tue, Apr 8, 2008 at 12:44 PM, LeCzar [EMAIL PROTECTED] wrote: Hey, I want to compute means and standard errors as two tables like this: se-function(x)sqrt(var(x)/length(x)) The missings are not your main problem. The command var computes the variance-covariance matrix. Some covariance values can be negative. Trying to take square roots is a mistake. For example, run example(var) to get some matrices to work with. C1[3,4] - NA C1[3,5] - NA Observe you can calculate var(C1, na.rm=T) but you cannot take sqrt of that because it would try to apply sqrt to negative values. To get the standard errors, it is necessary to reconsider the problem, do something like diag(var(C1, na.rm=T)) That will give the diagonals, which are positive, so sqrt(diag(var(C1, na.rm=T))) Works as well. But you have the separate problem of dividing each one by the square root of the length, and since there are missings that is not the same for every column. Maybe somebody knows a smarter way, but this appears to give the correct answer: validX - colSums( ! is.na(C1)) This gives the roots: sqrt(validX) Put that together, it seems to me you could try se - function(x) { myDiag - sqrt(diag(var(x, na.rm=T))) validX - colSums(! is.na(x)) myDiag/sqrt(validX) } That works for me: se(C1) Fertility Agriculture ExaminationEducation 50.740226 110.80861439.39061139.303898 Catholic Infant.Mortality 328.272207 4.513863 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] If statements for vectors
On Wed, Apr 9, 2008 at 4:07 PM, Laura Bonnett [EMAIL PROTECTED] wrote: Dear Sirs, I am using both the Bioconductor adds on (Affy, AffyPLM,...) and the 'standard' R-package. I am trying to select a list of genes which all have expression values below a certain threshold. I have done this by creating a vector which has 0s where the expression is greater than the threshold and 1s where it is less than or equal to it. Multiplying this vector by the expression values produces a list of 0s and expression values below the threshold value. However, now I need to remove the 0s. I thought that this would be relatively trivial but it appears it isn't!!! Without a working example from you, I have no way to test this proposal. But if I were you, I would get the index values of the right cases in one step, and then use that to choose the ones I want. theGoodOnes - which(exp2 = 0) exp3 - exp2[theGoodOnes] This can be crammed into one line, but I'd do it in two just to make sure it is correct, at least the first time. My example: x - rnorm(100) which(x = 0) [1] 9 11 12 13 14 16 19 20 21 22 24 25 28 30 32 34 35 36 38 40 41 42 43 44 46 [26] 47 49 50 53 54 57 59 61 63 64 66 68 69 70 72 75 78 80 81 82 83 88 90 97 98 theGoodOnes - which(x=0) newX - x[theGoodOnes] newX [1] 0.89285908 0.51753998 1.18485887 2.10003705 0.54535841 1.28313738 [7] 1.34092172 0.76064356 0.02201121 0.80808363 0.04578730 0.23045983 [13] 1.04306626 0.12694184 0.89706863 0.86302992 1.53471660 0.51192410 [19] 1.00366834 1.76923470 0.49508470 1.27888454 0.76706729 1.46340483 [25] 1.69315538 0.50537603 0.18422329 0.72968629 0.45490526 2.18208183 [31] 0.71926926 0.68915832 1.49076770 0.48763971 0.39273110 0.80709549 [37] 0.22099019 0.38103757 0.14626929 0.63933750 1.26643194 3.33091910 [43] 2.50341609 2.05286611 0.31986095 0.64548972 0.34712937 0.04075135 [49] 0.07206342 0.20325505 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Format regression result summary
On Fri, Apr 11, 2008 at 11:05 AM, Thiemo Fetzer [EMAIL PROTECTED] wrote: Hello to the whole group. I am a newbie to R, but I got my way through and think it is a lot easier to handle than other software packages (far less clicks necessary). [snip] However, my wish is the output to have a format like: Estimate (Intercept) 3.750367*** (0.172345) Var1 -0.002334 (0.009342) Var2 0.012551* (0.005927) Thanks a lot in advance Thiemo I attach an R function outreg that can make nice looking regression tables in various formats. I was thinking about writing something for R news about this, but stopped to wait because I could not figure a way to get lmer objects to print out in some pleasant way. I have several students and colleagues who use this, no problem lately. This pdf demonstrates some of the kinds of output it can create. http://pj.freefaculty.org/R/outreg-worked2.pdf The R code here is not pretty, I'm obviously trying to do things that are not anticipated by the original developers. I found it a bit tedious to do output with cat and so forth, but never found a more elegant way. If anybody wants to help with beautifying the code, please give me some pointers. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotmath overstrikes in output on a Linux system
On Tue, Apr 8, 2008 at 1:23 PM, Paul Johnson [EMAIL PROTECTED] wrote: On Tue, Apr 8, 2008 at 4:29 AM, Prof Brian Ripley [EMAIL PROTECTED] wrote: Looks likes the the laptop is using different fonts with incorrect font metrics. This could happen because it has a different screen resolution, or one of the systems is set to use scalable fonts or it is giving metrics for one fonts and using another or I found a fix for the funny font overlaps, and it was quite by accident. In case there are other Fedora users for whom this might appear, I would like to close the thread with this. In the livna repository, there is a package called freetype-freeworld-2.3.5-3.lvn8 After installing that, the fonts on screen are fine. I do not understand exactly why this works, or why freetype-freeworld is not in the freetype distribution all along. The package is described thusly, and you will notice it has the same magic words that Professor Ripley used--hints and glyphs: $ rpm -qi freetype-freeworld Name: freetype-freeworld Relocations: (not relocatable) Version : 2.3.5 Vendor: rpm.livna.org Release : 3.lvn8Build Date: Tue 18 Sep 2007 10:28:30 AM CDT Install Date: Sun 13 Apr 2008 11:16:45 PM CDT Build Host: plague-builder.livna.org Group : System Environment/Libraries Source RPM: freetype-freeworld-2.3.5-3.lvn8.src.rpm Size: 685309 License: FTL or GPLv2+ Signature : DSA/SHA1, Sat 22 Sep 2007 10:23:19 AM CDT, Key ID 71295441a109b1ec Packager: rpm.livna.org http://bugzilla.livna.org URL : http://www.freetype.org Summary : A free and portable font rendering engine Description : The FreeType engine is a free and portable font rendering engine, developed to provide advanced font support for a variety of platforms and environments. FreeType is a library which can open and manages font files as well as efficiently load, hint and render individual glyphs. FreeType is not a font server or a complete text-rendering library. This version is compiled with the patented bytecode interpreter and subpixel rendering enabled. It transparently overrides the system library using ld.so.conf.d. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rggobi is crashing R-2.7.0
On Tue, May 6, 2008 at 12:32 PM, Mark Kimpel [EMAIL PROTECTED] wrote: I am running 64-bit Ubuntu 8.04 and when I invoke rggobi the interactive graph displays but R crashes. See my sessionInfo() and a short example below. Ggobi and rggobi installed without complaints. Mark sessionInfo() R version 2.7.0 Patched (2008-05-04 r45620) x86_64-unknown-linux-gnu In the R 2.7 release notes, there is a comment about a change in the GUI libraries and it says that one must recompile everything that relies on R. If your R 2.7 was an upgrade, not a fresh install, it could explain why this is happening. If there's some old library or R package sitting around, it could account for this. The part that concerned me about the R release note is that they don't give a very clear guide on how far back in the toolchain we are supposed to go. Certainly, ggobi has to be rebuilt from scratch. But are any of the things on which ggobi depends needing recompliation as well. pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] poisson regression with robust error variance ('eyestudy
On Thu, May 8, 2008 at 8:38 AM, Ted Harding [EMAIL PROTECTED] wrote: The below is an old thread: On 02-Jun-04 10:52:29, Lutz Ph. Breitling wrote: Dear all, i am trying to redo the 'eyestudy' analysis presented on the site http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm with R (1.9.0), with special interest in the section on relative risk estimation by poisson regression with robust error variance. so i guess rlm is the function to use. but what is its equivalent to the glm's argument family to indicate 'poisson'? or am i somehow totally wrong and this is not applicable here? There have been several questions about getting robust standard errors in glm lately. I went and read that UCLA website on the RR eye study and the Zou article that uses a glm with robust standard errors. I don't think rlm is the right way to go because that gives different parameter estimates. You need to estimate with glm and then get standard errors that are adjusted for heteroskedasticity. Well, you may wish to use rlm for other reasons, but to replicate that eyestudy project, you need to take the ordinary usual estimates of the b's and adjust the standard errors. The Zou article does not give the mathematical formulae used to estimate those robust errors, it rather gives a code snippit on using SAS proc GENMOD to get those estimates. Presumably, if we had access to the SAS formula, we could easily get the calculations we need with R. It is a little irksome to me that people think saying use SAS proc GENMOD is informative. Rather, we really need to know which TYPE of robust standard error is being considered, since there are about 10 competing formulations. I started checking various R packages for calculating sandwich variance estimates. There is a package sandwich for that purpose, and in the car package there is a function hccm that can do so. The hccm in car refuses to take glm objects, but the function vcovHC in sandwich will do so. The discussion for sandwich's vcovHC function refers to linear models, and lm objects are used in the examples, but the vignette sandwich distributed with the package states The HAC estimators are already available for generalized linear models (fitted by glm) and robust regression (fitted by rlm in package MASS). . As a result, one can get a var/covar matrix from the routines in the sandwich package, but I'm not entirely sure they are match the ones SAS gives. The vcovHC help page has a nice explanation of the differences across sandwich estimators. It says they are all variants on (X'X)^{-1} X' Omega X (X'X)^{-1} With different stipulations about Omega. If we knew the stipulation about OMEGA used in the SAS routine, we could try it. I tried to get the eyestudy data, but the link on the UCLA website is no longer valid. So I generated some phony 0-1 data: y - as.numeric(rnorm(1000) 0) x - rnorm(1000) glm1 - glm(y~x, family=binomial) hccm(glm1) Error in hccm.lm(glm1) : requires unweighted lm vcovH vcovHAC vcovHC glm1 - glm(y~x, family=poisson(link=log)) vcovHC(glm1) (Intercept) x (Intercept) 1.017588e-03 -3.722456e-05 x -3.722456e-05 9.492517e-04 The default type of estimation the method called HC3, which is recommended by authors of some Monte Carlo research studies. vcovHC calculates White's basic correction if you run: vcovHC(glm1, type=HC0) (Intercept) x (Intercept) 1.013508e-03 -3.691381e-05 x -3.691381e-05 9.417839e-04 Compare against the non-robust glm var/covar matrix. vcov(glm1) (Intercept) x (Intercept) 0.0020152998 -0.778422 x -0.778422 0.0018721903 In conclusion, use glm followed by vcovHC and I believe you will find estimates like the ones provided by SAS or Stata. But, without access to their data, I can't be entirely sure. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] poisson regression with robust error variance ('eyestudy
Ted Harding said: I can get the estimated RRs from RRs - exp(summary(GLM)$coef[,1]) but do not see how to implement confidence intervals based on robust error variances using the output in GLM. Thanks for the link to the data. Here's my best guess. If you use the following approach, with the HC0 type of robust standard errors in the sandwich package (thanks to Achim Zeileis), you get almost the same numbers as that Stata output gives. The estimated b's from the glm match exactly, but the robust standard errors are a bit off. ### Paul Johnson 2008-05-08 ### sandwichGLM.R system(wget http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta;) library(foreign) dat - read.dta(eyestudy.dta) ### Ach, stata codes factor contrasts backwards dat$carrot0 - ifelse(dat$carrot==0,1,0) dat$gender1 - ifelse(dat$gender==1,1,0) glm1 - glm(lenses~carrot0, data=dat, family=poisson(link=log)) summary(glm1) library(sandwich) vcovHC(glm1) sqrt(diag(vcovHC(glm1))) sqrt(diag(vcovHC(glm1, type=HC0))) ### Result: # summary(glm1) # Call: # glm(formula = lenses ~ carrot0, family = poisson(link = log), #data = dat) # Deviance Residuals: # Min 1Q Median 3Q Max # -1.1429 -0.9075 0.3979 0.3979 0.7734 # Coefficients: #Estimate Std. Error z value Pr(|z|) # (Intercept) -0.8873 0.2182 -4.066 4.78e-05 *** # carrot0 0.4612 0.2808 1.6420.101 # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # (Dispersion parameter for poisson family taken to be 1) # Null deviance: 67.297 on 99 degrees of freedom # Residual deviance: 64.536 on 98 degrees of freedom # AIC: 174.54 # Number of Fisher Scoring iterations: 5 # sqrt(diag(vcovHC(glm1, type=HC0))) # (Intercept) carrot0 # 0.1673655 0.1971117 # ### Compare against ## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm ## robust standard errors are: ## .1682086 .1981048 glm2 - glm(lenses~carrot0 +gender1 +latitude, data=dat, family=poisson(link=log)) sqrt(diag(vcovHC(glm2, type=HC0))) ### Result: # summary(glm2) #Call: # glm(formula = lenses ~ carrot0 + gender1 + latitude, family = poisson(link = log), # data = dat) # Deviance Residuals: # Min 1Q Median 3Q Max # -1.2332 -0.9316 0.2437 0.5470 0.9466 # Coefficients: #Estimate Std. Error z value Pr(|z|) # (Intercept) -0.652120.69820 -0.934 0.3503 # carrot0 0.483220.28310 1.707 0.0878 . # gender1 0.205200.27807 0.738 0.4605 # latitude-0.010010.01898 -0.527 0.5980 # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # (Dispersion parameter for poisson family taken to be 1) #Null deviance: 67.297 on 99 degrees of freedom # Residual deviance: 63.762 on 96 degrees of freedom # AIC: 177.76 # Number of Fisher Scoring iterations: 5 # sqrt(diag(vcovHC(glm2, type=HC0))) # (Intercept) carrot0 gender1latitude # 0.49044963 0.19537743 0.18481067 0.01275001 ### UCLA site has: ## .4929193 .1963616 .1857414 .0128142 So, either there is some rounding error or Stata is not using exactly the same algorithm as vcovHC in sandwich. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to add space between main title to leave space for legend?
Hello, everybody: I recently encountered an example with in which the graph was placed in a way that did not leave room for a legend. Maybe you would describe it as legend too big, I'm not sure. I found myself wishing I could force in some space after the title. Here's working example where I'd like to make room for a legend. x - rnorm(100) hist(x, freq=F, main=Different Meanings of Normality) lines(density(x)) xseq1 - seq( min(x), max(x), length.out=100) m1 - mean(x) sd1 - sd(x) obsNormal - dnorm(xseq1, mean=m1, sd=sd1) lines( xseq1, obsNormal, lty=2, col=red) truNormal - dnorm(xseq1) lines(xseq1, truNormal, lty=3, col=green) legend(0,0.4, legend=c(observed density, normal with observed mean sd, normal with 'true' mean and sd), lty=c(1,2,3), col=c(black, red, green)) I tried fiddling around with par to change the margins, but it has the bad effect of resizing the image and creating a blank space into which I'm not allowed to write the legend. Know what I mean? I try par( mar=c(1,1,3,1)) or par(mai=c(1,1,4,1)) before the hist and it makes space, but useless space. I've been considering something desperate, such as layout(). Isn't there a simpler way? -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to add space between main title to leave space for legend?
On Sat, May 31, 2008 at 2:45 PM, Peter Dalgaard [EMAIL PROTECTED] wrote: Paul Johnson wrote: Hell (1) par(xpd=TRUE) allows you to write outside the plotting region O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B As usual, PD right on the money. Here's a working demo program for posterity. For most random samples, this gives a good looking graph. For a the other few, well, they happen less than 0.05 of the time :) ### histogramWithDensityLinesAndLegend.R ### Paul Johnson 2008-06-02 ### Thanks to r-help members for tips. x - rnorm(100) ###Allows writing outside plotting region par(mai=c(1,1,2.5,1)) par(xpd=TRUE) myhist - hist(x, freq=F, main=Different Meanings of Normality) lines(density(x)) xseq1 - seq( min(x), max(x), length.out=100) m1 - mean(x) sd1 - sd(x) obsNormal - dnorm(xseq1, mean=m1, sd=sd1) lines( xseq1, obsNormal, lty=2, col=red) truNormal - dnorm(xseq1) lines(xseq1, truNormal, lty=3, col=green) legend(min(xseq1),1.3*max(myhist$density), legend=c(observed density, normal with observed mean sd, normal with 'true' mean and sd), lty=c(1,2,3), col=c(black, red, blue)) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] why doesn't t.test value include standard error
Hello, everybody! I'm back to ask the obvious on behalf of the silent majority :) Today a student asked me what standard error was used in this t.test output. I looked into it and was a little surprised that a t.test output object does not have a slot for the standard error. Of course, we can reconstruct the se=mu-hat/t, but I was surprised. Do you think it would be nicer if t.test did include the denominator in the output? If we had that output, we could more easily compare the different methods of calculating the standard error that are discussed in ?t.test. ex: x - rnorm(100, m=10) myt - t.test(x, m=8) attributes(myt) $names [1] statistic parameter p.value conf.intestimate [6] null.value alternative method data.name $class [1] htest myse - myt$estimate/myt$statistic myse mean of x 0.4928852 myt$statistic t 20.33975 myt$estimate/myse mean of x 20.33975 Happy summer! Today was our last day of class in Kansas. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] times family unavailable in postscript device (Ubuntu Linux)
I'm running Ubuntu 9.04. I could use some advice about fonts in postscript devices. sessionInfo() R version 2.9.0 (2009-04-17) i486-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base I can use family=Times with pdf output, but postscript refuses. It says: plot(rnorm(10),rnorm(10), family=Times) Error in axis(side = side, at = at, labels = labels, ...) : family 'Times' not included in PostScript device This happens even though Times *appears* to be listed as a valid family : names(postscriptFonts()) [1] serifsans mono [4] AvantGarde Bookman Courier [7] HelveticaHelvetica-Narrow NewCenturySchoolbook [10] Palatino TimesURWGothic [13] URWBookman NimbusMonNimbusSan [16] URWHelvetica NimbusSanCondCenturySch [19] URWPalladio NimbusRomURWTimes [22] ComputerModern ComputerModernItalic Japan1 [25] Japan1HeiMin Japan1GothicBBB Japan1Ryumin [28] Korea1 Korea1debCNS1 [31] GB1 example(postscriptFonts) pstscF postscriptFonts() $serif $family [1] Times $metrics [1] Times-Roman.afm Times-Bold.afm Times-Italic.afm [4] Times-BoldItalic.afm Symbol.afm $encoding [1] default attr(,class) [1] Type1Font $sans $family [1] Helvetica $metrics [1] Helvetica.afm Helvetica-Bold.afm [3] Helvetica-Oblique.afm Helvetica-BoldOblique.afm [5] Symbol.afm $encoding [1] default attr(,class) [1] Type1Font $mono $family [1] Courier $metrics [1] Courier.afm Courier-Bold.afm [3] Courier-Oblique.afm Courier-BoldOblique.afm [5] Symbol.afm $encoding [1] default attr(,class) [1] Type1Font $AvantGarde $family [1] AvantGarde $metrics [1] agw_.afm agd_.afm agwo.afm agdo.afm Symbol.afm $encoding [1] default attr(,class) [1] Type1Font $Bookman $family [1] Bookman $metrics [1] bkl_.afm bkd_.afm bkli.afm bkdi.afm Symbol.afm $encoding [1] default attr(,class) [1] Type1Font $Courier $family [1] Courier $metrics [1] Courier.afm Courier-Bold.afm [3] Courier-Oblique.afm Courier-BoldOblique.afm [5] Symbol.afm $encoding [1] default attr(,class) [1] Type1Font $Helvetica $family [1] Helvetica $metrics [1] Helvetica.afm Helvetica-Bold.afm [3] Helvetica-Oblique.afm Helvetica-BoldOblique.afm [5] Symbol.afm $encoding [1] default attr(,class) [1] Type1Font $`Helvetica-Narrow` $family [1] Helvetica-Narrow $metrics [1] hvn_.afm hvnb.afm hvno.afm hvnbo___.afm Symbol.afm $encoding [1] default attr(,class) [1] Type1Font $NewCenturySchoolbook $family [1] NewCenturySchoolbook $metrics [1] ncr_.afm ncb_.afm nci_.afm ncbi.afm Symbol.afm $encoding [1] default attr(,class) [1] Type1Font $Palatino $family [1] Palatino $metrics [1] por_.afm pob_.afm poi_.afm pobi.afm Symbol.afm $encoding [1] default attr(,class) [1] Type1Font $Times $family [1] Times $metrics [1] Times-Roman.afm Times-Bold.afm Times-Italic.afm [4] Times-BoldItalic.afm Symbol.afm $encoding [1] default attr(,class) [1] Type1Font $URWGothic $family [1] URWGothic $metrics [1] a010013l.afm a010015l.afm a010033l.afm a010035l.afm s05l.afm $encoding [1] default attr(,class) [1] Type1Font $URWBookman $family [1] URWBookman $metrics [1] b018012l.afm b018015l.afm b018032l.afm b018035l.afm s05l.afm $encoding [1] default attr(,class) [1] Type1Font $NimbusMon $family [1] NimbusMon $metrics [1] n022003l.afm n022004l.afm n022023l.afm n022024l.afm s05l.afm $encoding [1] default attr(,class) [1] Type1Font $NimbusSan $family [1] NimbusSan $metrics [1] n019003l.afm n019004l.afm n019023l.afm n019024l.afm s05l.afm $encoding [1] default attr(,class) [1] Type1Font $URWHelvetica $family [1] URWHelvetica $metrics [1] n019003l.afm n019004l.afm n019023l.afm n019024l.afm s05l.afm $encoding [1] default attr(,class) [1] Type1Font $NimbusSanCond $family [1] NimbusSanCond $metrics [1] n019043l.afm n019044l.afm n019063l.afm n019064l.afm s05l.afm $encoding [1] default attr(,class) [1] Type1Font $CenturySch $family [1] CenturySch $metrics [1] c059013l.afm c059016l.afm c059033l.afm c059036l.afm s05l.afm $encoding [1] default attr(,class) [1] Type1Font $URWPalladio $family [1] URWPalladio $metrics [1] p052003l.afm p052004l.afm p052023l.afm p052024l.afm s05l.afm $encoding [1] default attr(,class) [1] Type1Font $NimbusRom $family [1] NimbusRom $metrics [1] n021003l.afm n021004l.afm n021023l.afm n021024l.afm s05l.afm $encoding [1] default attr(,class) [1] Type1Font $URWTimes $family [1]
Re: [R] fitdistr for t distribution
On Fri, May 15, 2009 at 6:22 AM, lagreene lagreene...@gmail.com wrote: Thanks Jorge, but I still don't understand where they come from. when I use: fitdistr(mydata, t, df = 9) and get values for m and s, and the variance of my data should be the df/s? I jsut want to be able to confirm how m and s are calculated I've wondered the same kind of thing and I've learned the answer is easy! It is not so easy for all R functions, but did you try this with fitdistr? library (MASS) fitdistr the output that follows is the ACTUAL FORMULA that is used to make the calculations! I've not yet mastered the art of getting code for some functions. predict function (object, ...) UseMethod(predict) environment: namespace:stats But I know there is a way to get that code if you know the correct way to run getS3method(). But I usually just go read the R source code rather than puzzle over that. mydt - function(x, m, s, df) dt((x-m)/s, df)/s fitdistr(x2, mydt, list(m = 0, s = 1), df = 9, lower = c(-Inf, 0)) Thanks anyway for the help! -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] save the output of summary(lmList(x)) into a dataframe
On Tue, Jun 16, 2009 at 3:29 AM, Cecilia Carmocecilia.ca...@ua.pt wrote: Hi r-helpers! I need to save the output of summary() function that I’ve runned like this: z- lmList(y~x1+x2| x3, na.action=na.omit,data1,subset=year==1999) w-summary(z) The output (w) is something like this: Call: Model: y ~ x1 + x2 | x3 Data: data1 Does this come close? I'm just using the lmList example from lme4 ### example(lmList) spawns an object fm1 that I demonstrate with: library(lme4) example(lmList) varnames - names(fm1[[1]]$coefficients) mylist - lmlist(fm1, summary) varnames - names(fm1[[1]]$coefficients) myCoefficients - function(x, name) { x[name,] } theintercepts - t( sapply(mylist, myCoefficients, varnames[1])) theSecondVar - t( sapply(mylist, myCoefficients, varnames[2])) I know other r-help readers will know a better way to cycle through the variables rather than running once for each element of varnames... -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R for Mac 10.3.9
I know nothing of Macintosh, so please be patient. My student has a Macintosh with OSX 10.3.9. The R for Mac says it is for 10.4.4 or higher. Aside from saying get a new Mac, what can be said to my student? Can you point me at the newest version of R that did work on 10.3 ? pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Working up examples of barplots with customized marginal items
Hello, everybody: I'm back to share a long piece of example code. Because I had trouble understanding the role of par() and margins when customizing barplots, I have struggled with this example. In case you are a student wanting to place legends or textual markers in the outer regions of a barplot, you might run these examples. There are a few little things I don't quite understand. If you look below where I've below placed ???, you see I've used text() to write inside the bars of a plot. To vertically align the strings, I use the text() function's option pos. However, the text is not centered horizontally inside the bars. I don't think I found a perfectly general solution with offset because it does not take into account the width of the bars. Second, can changes to par be forced on all devices? I learned the hard way that a change like par(mar=c(10,3,4,5)) will apply to the current working device, but when a new output device is created, the margins are re-set to default. The margins in the saved graph won't match the screen unless par is run again: postscript( ---whatever--- ) ###par must be run again to manipulate the postscript device: par ( --whatever ---) plot (--whatever---) dev.off() On the other hand, dev.copy(postscript, ---whatever--) will inherit the par values set on the screen device. Is there a single command that can adjust the margins of all devices that are created within a given session? Anyway, here's my big barplot script and I hope some students benefit from experimenting with it pj ### Paul Johnson ### Twisting the margins of a barplot ### I never thought too much about customizing barplots, but ### now I have learned some tricks to share. Step through these ### examples to see the possibilities. set.seed(424242) x - rpois(10, lambda=10) mynames - c(rep(Really Long Name,9),Really Long Name Long Long) #RSiteSearch(barplot names margins) barplot(x) ### Note long names off edge: barplot(x, names=mynames, las=2) ### Abbreviate offers one solution barplot(x, names=abbreviate(mynames), las=2) ### Other option is to reset margins ###default mar is c(5.1, 4.1, 4.1, 2.1) ### Lets make the bottom margin larger par(mar=c(15,4.1,4.1, 2.1)) barplot(x, names=mynames, las=2) ### Now the bottom is finished. But I'd like a legend. legend(topleft, legend=c(A really long label,B,Cee What I can do?,D), col=1:2) ### My bar hits the legend. How to fix? ### It is not sufficient to simply move the legend up ### because then it runs off the edge of the graph barplot(x, names=mynames, las=2) legend(2,22, legend=c(A really long label,B,Cee What I can do?,D), col=1:2) ### By itself, changing the top margin does not help. ### It is also vital to set the xpd parameter to T so that ### R will draw outside the main plot region par(mar=c(10,4.1,8.1, 2.1)) par(xpd=T) barplot(x, names=mynames, las=2) legend(2,22, legend=c(A really long label,B,Cee What I can do?,D), col=1:2) ### Now lets gain some control on the colors and bars ### The default colors are so dark. You can't write on top ### of them. You can grab shades of gray like gray30 or such. ### I'll just specify 4 colors I know will work. I prefer ### the light gray colors because they can be written over. mycols - c(lightgray,gray70,gray80,gray90,gray75) barplot(x, names=mynames, las=2, col=mycols) legend(2,20,legend=c(A,B,C,D),fill=mycols) ### Still, I don't like the solid shades so much. ### fill up the bars with angled lines. ### density determines number of lines inside the bar. myden - c(60,20,30,40,25) ### angles determines the direction of lines inside bars myangles - c(20,-30,60,-40,10) barplot(x, names=mynames, las=2, col=mycols,density=myden,angle=c(20,60,-20)) legend(1,20,legend=c(A,B,C,D),density=myden,angle=myangles) ### for my coupe de grace, lets do some writing in the bars. ### Recall from Verzani you can get the x coordinates from the barplot barPositions - barplot(x, names=mynames, las=2, col=mycols,density=myden,angle=c(20,60,-20)) barPositions ### The text option srt=90 turns the text sideways ### I'm just guessing that bars 1 and 8 should be labeled ### at coordinates 5 and 5 text (barPositions[1], 5, my first bar is great, srt=90) text (barPositions[8], 5, but 8 is greater, srt=90) ### Recently I had the problem of drawing a clustered bar chart. ### Lets suppose our x variable really represents responses from ### 2 groups of respondents, Men and Women. ## Create the matrix xmatr - matrix(x, nrow=5) ### Use beside=T to cause barplot to treat each column ### of values as a group barplot(xmatr, beside=T, names=c(Men,Women)) valueNames - c(Very Strong,Somewhat Strong,Not Strong,Somewhat Weak,Very Weak) ### Use mtext to write in the margin so that labels on plot are nice. par(mar=c(10,4.1,5.1, 2.1)) bp - barplot(xmatr, beside=T, names=NULL) ### bp contains the positions of the bars. mtext(valueNames, side=1, las=3, at=bp) mtext(c(Men,Women), side=1, at=c(bp[3],bp[8]),line
[R] Ever see Process R exited abnormally with code 4?
I'm on a Windows XP student's computer. When we get busy and start running R stuff, it hangs at random with the hour glass showing, but the system's cpu is not running at 100%. We sit and watch for a while, and then try alt-ctl-delete to kill the not responding program. In this case, I'm able to produce the problem almost every time by starting Rcmdr and running some graphs. I'm suspecting that there may be some hardware problem, possibly overheating, but have no evidence for that guess. I can't find the error codes in the R docs (sorry). I think the following supplies the information that the posting guide asks for: sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base library(Rcmdr) Loading required package: tcltk Loading Tcl/Tk interface ... done Loading required package: car Rcmdr Version 1.4-7 Attaching package: 'Rcmdr' The following object(s) are masked from package:tcltk : tclvalue x- rnorm(100) Loading required package: rgl Loading required package: mgcv This is mgcv 1.5-0 . For overview type `help(mgcv-package)'. Process R exited abnormally with code 4 at Fri Mar 13 17:19:47 2009 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Function!
On Fri, Mar 13, 2009 at 6:28 PM, Lars Bishop lars...@gmail.com wrote: Dear All, I need to write 'n' functions on 'm' variables. The functions should be constructed according to the values of an (nxm) matrix of '1/0' values as follows. For example, if row1 is equal to ,say [1 0 ...0 0] then f1 - (1+x1) if row 2 is equal to, say [1 1 1 0...0 1] then f2 -(1+x1)*(1+x2)*(1+x3)*(1+xm) if row n is equal to [0 1 0 1 1 0 . 0] then fn -(1+x2)*(1+x4)*(1+x5) Is there an efficient way to write these functions? Note that each f(x) is a function of m variables, independently on whether the function explicitly includes a given variable or not. Many thanks for your help! Regards, indic - matrix(c(1,1,0,0,0,1,0,0,1,0), ncol=5) X - data.frame(x1=3,x2=3,x3=4,x4=5,x5=6) ### Give this function 2 vectors, an indicator ### vector ind and a value vector Xin myHomework - function(ind, Xin){ prod(Xin[ , which(ind==1)]) } myHomework(indic[1,], X[1,]) myHomework(indic[2,], X[1,]) Seems to give the right answer. Let me know if we get an A. PJ -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to combine xtable and minipage with Sweave ?
On Fri, Mar 13, 2009 at 12:17 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 3/13/2009 12:07 PM, Ptit_Bleu wrote: Thanks Dieter for the link. You can use \includegraphics explicitly yourself, and avoid the automatic code generated by Sweave. For example, testfn, fig=true, include=false= curve(f, from = 1, to = 5) @ That generates the figure, but doesn't include it; then \begin{figure} \begin{center} \includegraphics{-testfn} \caption{The function $f(x) = |x-3.5| + (x-2)^2$.\label{fig:testfn}} \end{center} \end{figure} will do the include. You need to name the chunk that creates the figure for this to be workable, and if you have set a prefix in your SweaveOpts, you need to use it in the \includegraphics call. (Actually I forget if the hyphen is needed; I stripped off a prefix from the real example to show you this.) Duncan Murdoch Dear Duncan: Thanks for the tip. This is very useful to me. Just to clear up the filename question, I worked up an example for testing. For me it is necessary to include the name of the Rnw/tex file as a prefix in the includegraphics command. Maybe that's just me, or Ubuntu, or TexLive2007. Here is a working file SweaveExample.Rnw. In order to make this work, you see the figure names in includegraphics. \documentclass[english]{article} \usepackage[T1]{fontenc} \usepackage[latin9]{inputenc} \usepackage{babel} \begin{document} testfn, fig=true, include=false= curve(sin, from = 1, to = 5) @ testfn2, fig=true, include=false= curve(sin, from = 1, to = 5) @ % \begin{figure} \caption{My Figure} \includegraphics{SweaveExample-testfn} \end{figure} %%works without float too: \includegraphics{SweaveExample-testfn2} \end{document} -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with figure size when embedding fonts
On Sat, Mar 14, 2009 at 1:51 PM, Frank E Harrell Jr f.harr...@vanderbilt.edu wrote: Dear Colleagues: I need to make a graphic that uses the Nimbus rather than Helvetica font family so that the font can be embedded in the encapsulated postscript file. This is to satisfy a requirement from a journal for electronic submission of figures. I do the following: postscript('lowess.eps', onefile=FALSE, pointsize=18, horizontal=FALSE, family='NimbusSan') spar(mfrow=c(3,2)) . . . dev.off() At this point lowess.eps looks fine. When I run: embedFonts('lowess.eps') or embedFonts('lowess.eps', options='-sPAPERSIZE=letter') the figures are wider and the right panel of the 3x2 matrix of plots expands past the paper edge. Advice welcomed. Hello Frank: This is an interesting post. I've not played with this until you brought it to my attention. I made a little progress. It appears to me either that there is something wrong with embedFonts as applied to postscript files or you and I need to learn a lot of ghostscript options. I created a small test and I see the same problem you do--the margins bump if you embed the fonts. I think the problem may be worse than that, it looks like the embedFonts wrecks the bounding box of the eps file. Do this: x - rnorm(100) y - rnorm(100) ml - loess(y~x) plot(x,y) lines(ml) postscript('lowess.eps', onefile=FALSE, pointsize=18, horizontal=FALSE, family=NimbusSan) plot(x,y) lines(ml) text (-1,1,Really big writing) dev.off() Copy lowess.eps to a safe place, then run embedFonts(lowess.eps) when you compare the two eps files, the margins are quite different. I see the difference more sharply when I add the features I usually have in postscript: postscript('lowess.eps', onefile=FALSE, pointsize=18, horizontal=FALSE, family=NimbusSan, height=6,width=6,paper=special) plot(x,y) lines(ml) text(-1,1, Really big writing dev.off() If you run that, save a copy of lowess.eps, then run embedFonts, you see the new lowess.eps no longer has the proper bounding box. Here's the top of lowess-orig.eps %!PS-Adobe-3.0 EPSF-3.0 %%DocumentNeededResources: font NimbusSanL-Regu %%+ font NimbusSanL-Bold %%+ font NimbusSanL-ReguItal %%+ font NimbusSanL-BoldItal %%+ font StandardSymL %%Title: R Graphics Output %%Creator: R Software %%Pages: (atend) %%BoundingBox: 0 0 432 432 %%EndComments %%BeginProlog On the other hand, the one that results from the embedFonts is not EPS anymore, and the bounding box is, well, confusing. %!PS-Adobe-3.0 %%Pages: (atend) %%BoundingBox: 0 0 0 0 %%HiResBoundingBox: 0.00 0.00 0.00 0.00 %. %%Creator: GPL Ghostscript 863 (pswrite) %%CreationDate: 2009/03/15 20:00:36 %%DocumentData: Clean7Bit %%LanguageLevel: 2 %%EndComments %%BeginProlog It appears to me that running the eps output file through ps2epsi will re-create a tight bounding box, perhaps it is sufficient for your need. Rather than wrestle with the postscript ghostscript, I re-did this on a pdf output device. It appears to me that the margin placement is not affected by embedFonts. pdf('lowess.pdf', onefile=FALSE, pointsize=18, family=NimbusSan,height=6,width=6,paper=special) plot(x,y) lines(ml) text (-1,1,Really big writing) dev.off() I believe it is important to specify paper=special here. I wondered about your experience with par(mfcol---). Lately, I've had more luck printing out the smaller separate components and putting them on the same LaTeX figure. That way, I have a little more control, and the fonts in the figure captions are identical to my text, which I like. PJ -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating dataframe names on the fly?
On Fri, Mar 20, 2009 at 7:18 PM, science! karthik@gmail.com wrote: I am aware that it is easily possible to create var names on the fly. e.g. assign(paste(m,i,sep=),j) but is it possible to assign dataframes to variables created on the fly? e.g. If I have a dataframe called master and I wanted to subset parts of those data into separate dataframes, I could do: m1=subset(master,master$SAMPLE=='1') m2=subset(master,master$SAMPLE=='2') . but I would like to do this in a loop. Can someone give me suggestions on how to accomplish this? I tried assign(paste(m,i,sep=),subset(master,master$SAMPLE==i) with no success. Are you sure you really need to name these dataframes? Here's a workaround that I use for these cases. Create your new data frames and add them to a list, as in myframes - list(subset(master,master$SAMPLE=='1'), m2=subset(master,master$SAMPLE=='2')) Then when you want to use these things, you can get the first one as myframes[[1]] or myframes[[2]]. You can name the objects inside the list: names(myframes) - c(A,B) This is just as good as referring to them by name, in my experience. It also has the benefit that because your dataframes are in a list, then you can use features like lapply to do things for each dataset. I'm reading ?assign now, and it appears you can actually name these things. name - paste(fred,4,sep=) x - data.frame(py=rnorm(10)) assign(name,x) ls() [1] fred4 mname name x name [1] fred4 fred4 py 1 -1.04243477 2 -0.66475049 3 -0.08576428 4 0.64369356 5 -0.06828696 6 1.15710627 7 -2.45041700 8 0.40139655 9 0.27320936 10 -0.98028020 pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Goodness of fit for negative binomial model
On Fri, Mar 20, 2009 at 8:03 PM, t c mudiver1...@yahoo.com wrote: Dear r list, I am using glm.nb in the MASS package to fit negative binomial models to data on manta ray abundance, and AICctab in the bbmle package to compare model IC. However, I need to test for the goodness of fit of the full model, and have not been able to find a Pearson's Chi Squared statistic in any of the output. Am I missing it somewhere? Is there a way to run the test using either chisq.test() or goodfit()? I would appreciate any suggestions on how to test for goodness of fit in negative binomial models. Thanks, Tim Clark I found myself wondering if the Chi Square you get from anova() is the Pearson one you want? (see ?anova.negbin). Just an example: library(MASS) example(glm.nb) anova(quine.nb3) Analysis of Deviance Table Model: Negative Binomial(1.9284), link: log Response: Days Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(|Chi|) NULL 145272.291 Sex 11.705 144270.586 0.192 Sex:Age 6 30.202 138240.384 3.599e-05 Sex:Eth 2 20.461 136219.923 3.606e-05 Sex:Lrn 28.459 134211.465 0.015 Sex:Eth:Lrn 2 18.287 132193.178 1.069e-04 Sex:Age:Lrn 48.649 128184.529 0.070 Sex:Age:Eth 69.503 122175.025 0.147 Sex:Age:Eth:Lrn 47.572 118167.453 0.109 Warning message: In anova.negbin(quine.nb3) : tests made without re-estimating 'theta' That warning about re-estimating theta concerns me a bit. If that's not the correct Pearson statistic, I bet you can get what you need if you take the Pearson residuals and calculate whatever. I am looking at ?residuals.glm and I note you can get Pearson residuals. If you have a formula from a dusty old stats book with the formula, I bet you can get it done. pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot and Boxplot in the same graph
On Fri, Mar 20, 2009 at 10:02 PM, johnhj jhar...@web.de wrote: Hii, Is it possible, to use the plot() funktion and the boxplot() funktion together ? I will plot a simple graph and additionally to the graph on certain places boxplots. I have imagined to plot the graph a little bit transparency and show in the same graph on certain places boxplots Is it possible to do it in this way ? greetings, johnh -- Run the boxplot first, then use points() or other subsidiary plot functions to add the points in the figure. y - rnorm(100) x - gl(2,50) boxplot(x,y) points(x,y) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] I want to use Sweave, but only sometimes
Does anybody have a workable system to run an Rnw document through R-Sweave when necessary, but to just run it through LaTeX if no new R calculations are needed? I.e., the figures already exist, I do not need R to do more work for me, so I send the document straight to LaTeX. I want to leave open the option that I might need to run the document through Sweave in the future, so I don't want to just convert the document to pure LaTeX format. Here's why I think this must be possible. In this list, I saw one of you explain this approach to working with figures in Sweave docs: Instead of using the automatic figure inclusion approach like this: testfn2, fig=true= curve(sin, from = 1, to = 55) @ Do this instead: \SweaveOpts{prefix.string=foo/bar} testfn2, fig=true, include=false= curve(sin, from = 1, to = 55) @ \begin{figure} \caption{My Figure} \includegraphics{foo/bar-testfn} \end{figure} As long as the figures are saved in the directory foo, then LaTeX will find them, I think (hope!). Then if I could tell LaTeX to ignore the ...@ stuff, then it *seems* to me the Rnw file would be processed successfully without further hassle. I am sorry if this has been asked answered before, I may not know the magic words for searching for past solutions. pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Group by in R
On Mon, Apr 13, 2009 at 8:56 AM, Nick Angelou nikola...@yahoo.com wrote: data X1 X2 X3 X4 1 1 2 2 1 2 1 1 2 2 3 1 1 2 2 4 2 2 1 2 5 1 1 2 2 6 2 2 1 2 7 1 1 2 1 8 2 2 1 2 9 1 2 1 1 10 1 1 2 2 sqldf(select X1, X2, X3, X4, count(*) CNT from data group by X1, X2, X3, X4 ORDER BY X4, X1, X2, X3) X1 X2 X3 X4 CNT 1 1 1 2 1 1 2 1 2 1 1 1 3 1 2 2 1 1 4 1 1 2 2 4 5 2 2 1 2 3 The counts are fine, though it's not exactly what I need. I need a kind of contingency table: | levels of X4 | --- unique triplets of X1:X3 | 1 | 2 | - 1 1 1 | 0 0 1 1 2 | 1 4 1 2 1 | 1 0 1 2 2 | 1 0 2 1 1 | 0 0 2 1 2 | 0 0 2 2 1 | 0 3 2 2 2 | 0 0 So the final result should be a table structure like: 0 0 1 4 1 0 1 0 0 0 0 0 0 3 0 0 I propose this way to get the numbers you want. I create a new variable to represent the values of the three then make a table: md - matrix(c(1,2,2,1,1,1,2,2,1,1,2,2,2,2,1,2,1,1,2,2,2,2,1,2,1,1,2,1,2,2,1,2,1,2,1,1,1,1,2,2),ncol=4) dat - as.data.frame(md) names(dat)- c(x1,x2,x3,x4) newvar - factor(paste(dat$x1,dat$x2,dat$x3,sep=-)) table(newvar, dat$x4) Behold: table(newvar, dat$x4) newvar 1 2 1-1-1 1 0 1-2-1 1 0 1-2-2 1 3 2-1-1 1 0 2-1-2 1 0 2-2-1 1 0 2-2-2 0 1 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do you specify font family in png output; png cross-platform issues
For teaching purposes, I prepared a little R program. I want to give this to students who can run it and dump out many formats and then compare their use in LaTeX documents. I do not have too much trouble with xfig or postscript format, but I've really run into a roadblock where png files are concerned. My original problem was that the png device does not accept a family option. How can I have png output with the Times family to compare with the postscript or pdf output? While searching for information on this, I discovered there have been a lot of R changes in png support. If I give this script to people with Mac or Windows, what are the chances that it will work? If I'm reading the png help page correctly, there are different types available, Xlib and cairo, but I don't understand what all that means for sending a program like this across systems. (fear the worst, but ask hoping for best). As far as I understand it, the paper=special option is needed so that the eps or pdf output will fit into a document without creating really huge margins around the graph. Correct? x- rnorm(333) y- rnorm(333) plot ( x,y, xlab=Input Variable, ylab=Output Variable) xfig(file=testplot.fig, horizontal=F, height=6, width=6, family=Times) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() postscript(file=testplot-1.eps, horizontal=F, height=6, width=6, family=Times, onefile=F, paper=special) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() postscript(file=testplot-2.eps, horizontal=F, height=4, width=4, family=Times, onefile=F, paper=special) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() pdf(file=testplot-1.pdf, height=6, width=6, family=Times,onefile=F,paper=special) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() png(file=testplot-1.png, height=350, width=550, type=Xlib) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() png(file=testplot-2.png, height=350, width=550, type=cairo) plot ( x,y, xlab=Input Variable, ylab=Output Variable) dev.off() Can I bother you about one last png issue? While searching r-help, I see posts about the difference in png output between type Xlib and cairo. For reasons I do not understand, ordinary viewers like GQview or Firefox make cairo-produced png files look blurry (in the words of posts on r-help). The png output from type=Xlib output is not blurry. This raises another level of confusion about this exercise I'm devising. Does R for Windows, as provided on the CRAN system, use Xlib for png? pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCALG Package
This means you need to install the Rgraphviz package. Have you tried? For me, Rgraphviz is not in CRAN, but it is required for that package you want. Rgraphviz is hosted in biocondoctor, so you have to install it through that route. http://www.bioconductor.org/packages/release/bioc/html/Rgraphviz.html After that, you re-install the other package you really wanted. library(pcalg) Loading required package: MASS Loading required package: graph Loading required package: robustbase Loading required package: Rgraphviz Loading required package: grid Loading required package: ggm Attaching package: 'ggm' The following object(s) are masked from package:graph : edgeMatrix Loading required package: mnormt On Tue, Jan 27, 2009 at 12:17 PM, Tibert, Brock btib...@bentley.edu wrote: I can not even get the package to run. I installed the package, and it is telling me I need rGraphViz. I was told to install it was included in the Bioconductor package, but that did not work either. The error message I routinely get is surrounding a missing RGraphViz package. I have searched the internet, saw a place to install it. I attempted that as well, but to no avail. I am stumped. Does it work for you? IF so, when did you install the package? Many thanks, Brock library(pcalg) Loading required package: MASS Loading required package: graph Loading required package: robustbase Loading required package: Rgraphviz Error: package 'Rgraphviz' could not be loaded In addition: Warning message: In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) : there is no package called 'Rgraphviz' -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] I want axes that cross
Hello, everybody. A student asked me a howto question I can't answer. We want the length of the drawn axes to fill the full width and height of the plot, like so: | | * | * | * ---|- However, when we use plot with axes=F and then use the axis commands to add the axes, they do not cross over each other. We get | * | * * -- The axes are not wide enough to cover the entire range of the data. We do not want to add a box() command, which is a usual answer for this, because we don't want to draw on top or the side. Here's my test case: x - rnorm(100) z - gl(2,50) y - rnorm(100, mean= 1.8*as.numeric(z)) plot(x,y,type=n, axes=F) points(x,y, pch=$,cex=0.7, col=z) axis(1, col=green, col.axis=green) axis(2, col=red, col.axis=red) Can I cause the 2 axes to cross as desired? The axis help says the axis is clipped at the plot region, but it seems to get clipped even more narrowly thanthat. I've been searching in r-help quite a bit and am a little surprised how difficult this is -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I want axes that cross
On Fri, Feb 13, 2009 at 1:51 PM, Marc Schwartz marc_schwa...@comcast.net wrote: on 02/13/2009 01:25 PM Paul Johnson wrote: Hello, everybody. A student asked me a howto question I can't answer. We want the length of the drawn axes to fill the full width and height of the plot, like so: Paul, I am guessing that you want: x - rnorm(100) z - gl(2,50) y - rnorm(100, mean= 1.8*as.numeric(z)) plot(x,y,type=n, axes=F) points(x,y, pch=$,cex=0.7, col=z) axis(1, col=green, col.axis=green) axis(2, col=red, col.axis=red) # Draw the box like an L on the bottom and left only box(bty = l) Note that you can specify which sides the 'box' is created upon by using the 'bty' argument. See ?box for more information. Thanks, I did not find bty under ?box, but found it under par after you pointed it out. That does not get the correct output, however, because the black box covers over my 2 different colored axes. Even if I weren't color-conscious, it gives me this: | | |___ not crossed axes, which I want: | | _|__ | I'm putting in a seed so we will both see the same things in this example. set.seed(1233240) x - rnorm(100) z - gl(2,50) y - rnorm(100, mean= 1.8*as.numeric(z)) plot(x,y,type=n, axes=F) points(x,y, pch=$,cex=0.7, col=z) axis(1, col=green, col.axis=green) axis(2, col=red, col.axis=red) # MS recomends: # Draw the box like an L on the bottom and left only box(bty = l) Also, by default, the axes extend the range of 'x' and 'y' by 4%. You can use 'xaxs = i' and 'yaxs = i' in the plot() call to restrict the axes to the true ranges of 'x' and 'y'. This would be important, for example, when you want the lower left hand corner of the plot to be at exact coordinates such as 0,0. I would be delighted if the axes really did reach 4% outside the data. But they don't. I've seen that same thing you are referring to in the documentation, but there's something wrong about it, In my example code, we should see the same thing now I've put in a seed. The axes are smaller than the data range, not equal to 1.04 times the data range. I see several observations in the graph that are off the charts, they are above the highest value of the y axis, or below the lowest axis value. Similarly, there are observations smaller than the low end of the x axis and bigger than the largest x axis value. The 4% may be the plot region's size, but it is surely not the length of the axis that is drawn? See ?par for more information. HTH, Marc Schwartz -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I want axes that cross
On Fri, Feb 13, 2009 at 1:42 PM, Daniel Moreira daniel.more...@duke.edu wrote: Try defining the argument 'pos' in the axis command-line, like: x - rnorm(100) z - gl(2,50) y - rnorm(100, mean= 1.8*as.numeric(z)) plot(x,y,type=n, axes=F) points(x,y, pch=$,cex=0.7, col=z) axis(1, col=green, col.axis=green, pos=0) axis(2, col=red, col.axis=red, pos=-2) Daniel Moreira, MD If you actually ran that code and still suggest it as the fix, then I think you must be joking. Pushing the axes into the middle of the data cloud in order to make them cross is certainly not making a very nice looking plot. Not only are there observations outside the area framed by the axes, but there are axis labels that are overlapped by observations and by the axes themselves. PJ -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I want axes that cross
On Fri, Feb 13, 2009 at 3:14 PM, Marc Schwartz marc_schwa...@comcast.net wrote: on 02/13/2009 02:19 PM Paul Johnson wrote: On Fri, Feb 13, 2009 at 1:51 PM, Marc Schwartz OK, so given all of the above, something like the following should work: set.seed(1233240) x - rnorm(100) z - gl(2,50) y - rnorm(100, mean = 1.8 * as.numeric(z)) # Calculate a new range, subtracting a definable value # from the min of each vector for the new minimum # Adust the 0.25 as may be needed X - c(min(x) - 0.25, max(x)) Y - c(min(y) - 0.25, max(y)) # Use 'X' and Y' here, not 'x' and 'y' # So that the plot region is extended appropriately plot(X, Y, type = n, axes = F, xlab = x, ylab = y) points(x, y, pch = $, cex = 0.7, col = z) # DO use 'pos'... axis(1, pos = Y[1], col = green, col.axis = green) axis(2, pos = X[1], col = red, col.axis = red) # get the plot region boundaries usr - par(usr) segments(X[1], usr[3], X[1], usr[4], col = red) segments(usr[1], Y[1], usr[2], Y[1], col = green) HTH, Marc Thanks, Marc and everybody. This last suggestion produces the graph I was trying for. The other approaches that people suggest, using pos or xaxp, produce other undesired changes in the tick marks or the position of the axes relative to the data. xaxp offers the promise of a more intuitive solution, except that, when using it, the tick marks are pushed off in a bad way. Your use of segments to draw the extensions of the axes was the first intuition I had, but I did not know about the trick you used to retrieve the size of the usr box. More secrets of par, revealed. Have you ever seen a drawing of the regions of an R plot with the terminology that is used for parts? Until I saw your example code, I had not understood that the plot axes are placed at the absolute edge of the user plotting region, for example. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SPSS data import: problems work arounds for GSS surveys
I'm using R 2.8.1 on Ubuntu 8.10. I'm writing partly to ask what's wrong, partly to tell other users who search that there is a work around. The General Social Survey is a long standing series of surveys provided by NORC (National Opinion Research Center). I have downloaded some years of the survey data in SPSS format (here's the site: http://www.norc.org/GSS+Website/Download/SPSS+Format/). When I try to import using foreign, I get an error like so: library(foreign) dat - read.spss(gss2006.sav, to.data.frame=T, trim.factor.names=T) Error in inherits(x, factor) : object cp not found In addition: Warning messages: 1: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) : gss2006.sav: File contains duplicate label for value 99.9 for variable TVRELIG 2: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) : gss2006.sav: File contains duplicate label for value 99.9 for variable SEI 3: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) : gss2006.sav: File contains duplicate label for value 99.9 for variable FIRSTSEI 4: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) : gss2006.sav: File contains duplicate label for value 99.9 for variable PASEI 5: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) : gss2006.sav: File contains duplicate label for value 99.9 for variable MASEI 6: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) : gss2006.sav: File contains duplicate label for value 99.9 for variable SPSEI 7: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) : gss2006.sav: File contains duplicate label for value 0.75 for variable YEARSJOB 8: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) : gss2006.sav: File-indicated character representation code (1252) looks like a Windows codepage No dat object is created from this. I have found a work around. I installed PSPP version 0.6.0 and used it to open the sav file, and then re-save it in SPSS sav format. That creates an SPSS file that foreign's function can open. I still see the warnings about redundant value labels, but as far as I can see these are harmless. A working object is obtained like so: dat - read.spss(gss-pspp.sav) Warning messages: 1: In read.spss(gss-pspp.sav) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable TVRELIG 2: In read.spss(gss-pspp.sav) : gss-pspp.sav: File contains duplicate label for value 0.75 for variable YEARSJOB 3: In read.spss(gss-pspp.sav) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable SEI 4: In read.spss(gss-pspp.sav) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable FIRSTSEI 5: In read.spss(gss-pspp.sav) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable PASEI 6: In read.spss(gss-pspp.sav) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable MASEI 7: In read.spss(gss-pspp.sav) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable SPSEI There is still some trouble with the importation of this SPSS file, however. It has the symptoms of being a non-rectangular data array, I think. What do you think about these warnings: dat - read.spss(gss-pspp.sav,to.data.frame=T) There were 22 warnings (use warnings() to see them) warnings() Warning messages: 1: In read.spss(gss-pspp.sav, to.data.frame = T) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable TVRELIG 2: In read.spss(gss-pspp.sav, to.data.frame = T) : gss-pspp.sav: File contains duplicate label for value 0.75 for variable YEARSJOB 3: In read.spss(gss-pspp.sav, to.data.frame = T) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable SEI 4: In read.spss(gss-pspp.sav, to.data.frame = T) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable FIRSTSEI 5: In read.spss(gss-pspp.sav, to.data.frame = T) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable PASEI 6: In read.spss(gss-pspp.sav, to.data.frame = T) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable MASEI 7: In read.spss(gss-pspp.sav, to.data.frame = T) : gss-pspp.sav: File contains duplicate label for value 99.9 for variable SPSEI 8: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] : longer object length is not a multiple of shorter object length 9: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] : longer object length is not a multiple of shorter object length 10: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] : longer object length is not a multiple of shorter object length 11: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] : longer object length is not a multiple of shorter object length 12: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] : longer object length is not a multiple of shorter object length 13: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] : longer object length is not a multiple
Re: [R] vertically aligned X axis labels disappear off R Graphics window
On Wed, Feb 25, 2009 at 12:51 PM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: R User R User wrote: Hi guys, I'm evaluating R for basic data exploration. I produce a bar plot of the data, with the x axis labels aligned vertically. However, the start of labels longer than about 10 characters are cut off by the bottom of the graphics window. Been there, did that :) Run these steps. Run the par with various values to change margins. The size of the figure is fixed from the outside edges, so when you make the margin bigger, the drawing gets squeezed. x - rpois(14, lambda=10) mynames - c(Really Long Name,Really Long Name,Really Long Name,Really Long Name,Really Long Name,Really Long Name,Really Long Name,Really Long Name,Really Long Name,Really Long Name,Really Long Name,Really Long Name,Really Long Name,Really Long Name Long Long) barplot(x, names=mynames, las=2) ## RSiteSearch(barplot names margins) barplot(x, names=abbreviate(mynames), las=2) par(mar=c(9.5,3,0.5, 0.3)) barplot(x, names=mynames, las=2) ## believe should be same as following, but the following works ## to change the margin only once, and further calls ## to similar have no margin changing effect. So ## you need run par over again before boxplotting. barplot(x, names=mynames, las=2, mar=c(9.5,3,0.5, 0.3)) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding value labels on Interaction Plot
On Wed, Mar 4, 2009 at 10:52 AM, Dimitri Liakhovitski ld7...@gmail.com wrote: Hello - and sorry for what might look like a simple graphics question. I am building an interaction plot for d: d=data.frame(xx=c(3,3,2,2,1,1),yy=c(4,3,4,3,4,3),zz=c(5.1,4.4,3.5,3.3,-1.1,-1.3)) d[[1]]-as.factor(d[[1]]) d[[2]]-as.factor(d[[2]]) print(d) interaction.plot(d$xx, d$yy, d$zz, type=b, col=c(red,blue), legend=F, lty=c(1,2), lwd=2, pch=c(18,24), xlab=X Label, ylab=Y Label, main=Chart Label) I am trying and not succeeding in adding Y values (value labels in Excel speak) near the data points on 3 lines of the graph. I understand that I might have to use text. But how do I tell text to use the actual coordinates of the dots on the lines? Thank you very much! I'm not understanding your trouble, exactly. I had not heard of interaction.plot before and so I've run your code and it is an interesting function. Thanks for providing the working example. I can help you with the text. It is easy to add text. A commmands like text( 1.3, 1, whatever, pos=3) will put the word whatever on top of coordinates x and y. (you leave out pos=3 and R centes the text on the coordinates). If you need to find out x , y before running that, you can. the locator function will return coordinates. Run locator(1) and then left click on a point in the graph. Coordinates will pop out on the screen. And you can make the text placement depend on locator text(locator(1), whatever, pos=3) I don't think you want to do that work interactively, however. It can be automated. You can add a column of names in your data frame and more or less automate the plotting as well. I did this to test. mylabels - c(A,B,C,D,E,F) text(d$xx,d$zz, mylabels, pos=3) This almost works perfectly, but it plops the labels in the wrong spots. I'd like to change the command so that the position of the text for the red line would be on the right, while the position of the text for the blue line is on the left. It appears to me your variable yy is the one that separates the 2 lines. Correct? I observe: as.numeric(d$yy) [1] 2 1 2 1 2 1 We want the blue ones on the left, for them we need pos=2. For the others, we want pos=4 Ach. I tried this text( d$xx, d$zz, mylabels, pos=2*as.numeric(d$yy)) but it comes out backward. So how about this: text( d$xx, d$zz, mylabels, pos=as.numeric(d$yy)) That positions the red ones below the line and the blue ones to the left. That doesn't look too bad to me. Anyway, I think you get the idea. If you wanted to simply stop plotting the circles, and put the letters right on the spot, that would be easy as well. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Table Transformation
On Wed, Mar 4, 2009 at 11:58 AM, Christian Pilger christian.pil...@gmx.net wrote: Dear R-experts, recently, I started to discover the world of R. I came across a problem, that I was unable to solve by myself (including searches in R-help, etc.) I have a flat table similar to key1 key2 value1 abcd_1 BP 10 abcd_1 BSMP 1A abcd_1 PD 25 abcd_2 BP 20 abcd_3 BP 80 abcd_4 IA 30 abcd_4 PD 70 abcd_4 PS N I wish to transform this table to obtain the following result: key2 key1 BP BSMP IA PD PS abcd_1 10 1A 25 abcd_2 20 abcd_3 80 abcd_4 30 70 N I think we would say that a dataframe of the first type is in the long format, while the other one you want is in the wide format. I've done changes like that with the reshape function that is in the stats package. This example you propose is like making one column for each country where key 1 is like the year in which the observation is made. Right? You don't have an easily cut-and-pasteable code example, so I've generated a little working example. Here, x1 is key 1 and x2 is key 2. x1 - gl(4,5, labels=c(c1,c2,c3,c4)) x1 [1] c1 c1 c1 c1 c1 c2 c2 c2 c2 c2 c3 c3 c3 c3 c3 c4 c4 c4 c4 c4 Levels: c1 c2 c3 c4 x2 - rep(1:5,4) df - data.frame(x1, x2, y=rnorm(20)) df x1 x2 y 1 c1 1 0.02095747 2 c1 2 0.05926233 3 c1 3 -0.07561916 4 c1 4 -1.06272710 5 c1 5 -1.89202032 6 c2 1 -0.04549782 7 c2 2 -0.68333187 8 c2 3 -0.99151410 9 c2 4 -0.29070280 10 c2 5 -0.97655024 11 c3 1 0.33411223 12 c3 2 -0.24907340 13 c3 3 -0.25469819 14 c3 4 1.23956157 15 c3 5 -1.38162430 16 c4 1 0.50343661 17 c4 2 -0.58126964 18 c4 3 0.24256348 19 c4 4 -0.39398578 20 c4 5 0.01664450 reshape(df, direction=wide, timevar=x2, idvar=x1) x1 y.1 y.2 y.3y.4y.5 1 c1 0.02095747 0.05926233 -0.07561916 -1.0627271 -1.8920203 6 c2 -0.04549782 -0.68333187 -0.99151410 -0.2907028 -0.9765502 11 c3 0.33411223 -0.24907340 -0.25469819 1.2395616 -1.3816243 16 c4 0.50343661 -0.58126964 0.24256348 -0.3939858 0.0166445 Your case will have many missings, but I think the idea is the same. HTH -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a noisy variable in regression (not an R question)
On Sat, Mar 7, 2009 at 11:49 AM, Juliet Hannah juliet.han...@gmail.com wrote: Hi, This is not an R question, but I've seen opinions given on non R topics, so I wanted to give it a try. :) How would one treat a variable that was measured once, but is known to fluctuate a lot? For example, I want to include a hormone in my regression as an explanatory variable. However, this hormone varies in its levels throughout a day. Nevertheless, its levels differ substantially between individuals so that there is information there to use. One simple thing to try would be to form categories, but I assume there are better ways to handle this. Has anyone worked with such data, or could anyone suggest some keywords that may be helpful in searching for this topic. Thanks for your input. From teaching econometrics, I remember that if the truth is y=b0+b1x1+noise and then you do not have a correct measure of x1, but rather something else like ex1=x1+noise, then the regression estimate of b1 is biased, generally attenuated. As far as I understand it, the technical solutions are not too encouraging You can try to get better data or possibly to build an instrumental variables model, where you could have other predictors of the true value of x1 in a first stage model. I don't recall that I was able to persuade myself that approach really solves anything, but many people recommend it. I suppose a key question is whether you can persuade your audience that ex1= x1+noise and whether that noise is well behaved. As I was considering your problem, I was wondering if there might not be a mixed model approach to this problem. You hypothesize the truth is y=b0+b1x1+noise, but you don't have x1. So suppose you reconsider the truth as a random parameter, as in y=b0+c1*ex1+noise. ex1 is a fixed estimate of the hormone level for each observation. c1 is a random, varying coefficient because the effect of the hormone fluctuates in an unmeasurable way. Then you could try to estimate the distribution of c1. You have an interesting problem, I think. pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotmath with values?
On Wed, Dec 31, 2008 at 10:47 AM, Mike Prager mike.pra...@noaa.gov wrote: I hope to use the plotmath facility to print titles that mix math and values of R variables. The help for plotmath has an example, which after repeated reading, I find baffling. Likewise, I have read the help file for substitute (wqhich seems to be needed) without ever understanding what it does, other than being used in some magic incantations. I would like to do something like this: dev.new() aa - round(pi,2) plot(1:3, 1:3, main = ~ a == aa) and have the main title read a = 3.14 but of course it reads a = aa. I agree this is a very difficult part of using R. I asked this exact same kind of question last year. If you go to an R-help email archive for April 2, 2008 Trouble combining plotmath, bquote, expressions you will find some discussion in the list. For a while, I got pretty excited and started working on better example code to put into the R distribution, but lost initiative when I saw the magnitude of the problem. This example code will show several usages of bquote and an alternative substitute approach. The result seems not to look so beautiful as I recall, but isn't that always the way it goes :). ### Filename: Normal1_2008.R ### Paul Johnson March 31, 2008 ### This code should be available somewhere in http://pj.freefaculty.org. If it is not ### email me paulj...@ku.edu mu - 10.034 sigma - 12.5786786 myx - seq( mu - 3.5*sigma, mu+ 3.5*sigma, length.out=500) myDensity - dnorm(myx,mean=mu,sd=sigma) # Here's one way to retrieve the values of mu and sigma and insert them into an expression t1t2 - bquote (paste(Normal(, mu== .(round(mu,2)), ',', sigma== .(round(sigma,2)),)) ) plot(myx, myDensity, type=l, xlab=x, ylab=Probability Density , main=t1t2) t1t2 - bquote (paste(Normal ( , mu== .(round(mu,2)), ' , ', sigma^2== .(round(sigma^2,2)), )) ) ### Note spaces manually inserted above are needed, otherwise plotmath overlaps l of normal with first parenthesis plot(myx, myDensity, type=l, xlab=x, ylab=Probability Density , main=t1t2) ### Had difficulty finding syntax for substitute to combine symbols mu and sigma with values. ### Following is best I can figure, no simpler or obvious than previous method. ##t1 - substitute ( mu == a , list (a = mu)) ##t2 - substitute ( sigma == a, list(a = sigma)) ##t1t2 - bquote(paste(Normal(, .(t1),,, .(t2),) ) ) t1t2 - substitute( Normal ~~ group( (, list(mu==mu1, sigma^2==sigma2), )) , list(mu1=round(mu,2), sigma2=round(sigma^2,2))) plot(myx, myDensity, type=l, xlab=x, ylab=Probability Density , main=t1t2) plot(myx, myDensity, type=l, xlab=x, ylab=Probability Density , main=t1t2, axes=F) axis(2, pos= mu - 3.6*sigma) axis(1, pos=0) # bquote creates an expression that text plotters can use t1 - bquote( mu== .(mu)) text( mu, 0.00, t1, pos=3) ss = 0.2 * max(myDensity) arrows( x0=mu, y0= ss, x1=mu+sigma, y1=ss, code=3, angle=90, length=0.1) t2 - bquote( sigma== .(round(sigma,2))) text( mu+0.5*sigma, 1.15*ss, t2) normalFormula - expression (f(x) == frac (1, sqrt(2*pi)*sigma) * e^{~~ - ~~ frac(1,2)~~ bgroup((, frac(x-mu,sigma),))^2}) text ( mu + 0.5*sigma, max(myDensity)- 0.10 * max(myDensity), normalFormula, pos=4) ### Theory says we should have about 2.5% of the area to the left of: -1.96 * std.dev criticalValue - mu -1.96 * sigma specialX - myx[myx = criticalValue] ### mark the critical value in the graph text ( criticalValue, 0 , label= paste(round(criticalValue,2)), pos=1) specialY - myDensity[myx criticalValue] polygon(c(specialX[1], specialX, specialX[length(specialX )]), c(0, specialY, 0), density=c(-110),col=lightgray ) shadedArea - round(pnorm(mu - 1.96 * sigma, mean=mu, sd=sigma),4) ### Hard to position this text just right al - paste( Prob(, x = , round(criticalValue,2),)\n=,F(, round(criticalValue,2) ,)\n=, round(shadedArea,3),sep=) text( criticalValue- sigma, myDensity[length(specialX)], labels=al, pos=3) ss - 0.1 * max(myDensity) arrows( x0=mu, y0= ss, x1=mu-1.96*sigma, y1=ss, code=3, angle=90, length=0.1) text( mu - 2.0*sigma, 1.15*ss, bquote(paste(.(round(criticalValue,2)),phantom(1) == mu - 1.96, , sigma,sep= )),pos=4 ) From a user's point of view -- one who has never written a parser nor taken a course in compilers -- what is needed is the nonexistent function value usable in plotmath expressions to produce the value of its argument, as plot(1:3, 1:3, main = ~ a == value(aa)) How can this be done? THANKS! -- Mike Prager, NOAA, Beaufort, NC * Opinions expressed are personal and not represented otherwise. * Any use of tradenames does not constitute a NOAA endorsement. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul E. Johnson
[R] Sweave documents have corrupted double quotes
I'm attaching a file foo.Rnw and I'm hoping some of you might run it through your R latex systems to find out if the double-quotes in typewriter font turn out as black boxes (as they do for me). If you don't use Sweave, but you have a system with a working version of R and LaTeX, the file gives the instructions you need to use to process the file. The The file itself explains the problem. You can see the flawed output on my web site http://pj.freefaculty.org/latex/foo.pdf I'm running Ubuntu Linux 8.10 with R 2.8.1 and TexLive 2007 (which is provided with the distribution). This is not a new problem, I noticed it two years ago while using TeTeX on Fedora Linux, and so I doubt that this is specific to TeXLive. Back then, I took the path of resistance and stopped using the typewriter font. That is becoming inconvenient, however. I would sincerely appreciate any pointers you have. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave documents have corrupted double quotes
On Fri, Jan 16, 2009 at 10:43 AM, David Winsemius dwinsem...@comcast.net wrote: Dear Dr Johnson; I'm not sure if you get copies of your posts. If you do can you check to see if the list-server kept the attachment? My copy did not have one. -- Best David winsemius Hm. Well, I do get the attachment, and don't understand why you don't. But if you are willing, you can get the file here, same directory as the dvi and pdf output: http://pj.freefaculty.org/latex/foo.Rnw -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave documents have corrupted double quotes
On Fri, Jan 16, 2009 at 11:06 AM, Vincent Goulet vincent.gou...@act.ulaval.ca wrote: Paul, The file did not make it to the list. Did you try loading Sweave with the 'noae' option, that is: \usepackage[noae]{Sweave} This *may* solve your issue. HTH Vincent Wow. That does fix it. I bow to you. I need to learn how this option can be put into the Rnw file itself . Are you the same Vincent Goulet who offers the customized Emacs for windows? (http://vgoulet.act.ulaval.ca/ressources/emacs). If so, thanks again for that. Keep up the good work. It has saved me and my students tons of time setting up Windows systems. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave documents have corrupted double quotes
Hey, everybody. I am concluding that this Sweave wrecks quotation marks in typewriter font is a bug in Sweave. I've uploaded the foo.tex file so now you can see the Rnw, tex, and pdf file side by side. http://pj.freefaculty.org/latex As previous poster noted, the Sweave instruction is added at the end of the preamble in foo.tex. However, moving the \usepackage{Sweave} to the beginning of the preamble has no effect. Same black boxes. It is true that one can work around this by inserting the declaration \usepackage[noae]Sweave into the Rnw file (or in the tex file). That solves the problem, however, the user is left with a suspicious feeling. What will go wrong in my output after [noae] is inserted? If nothing, why was it there in the first place? pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave documents have corrupted double quotes
On Sat, Jan 17, 2009 at 4:00 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 17/01/2009 4:29 PM, Ben Bolker wrote: Duncan Murdoch murdoch at stats.uwo.ca writes: R is open source, so this is no mystery: if you use [noae] then Sweave won't use the ae package in LaTeX. The problem you were having with quotes is that the ae package doesn't define those. If you choose not to use it, then you won't get the symbols that it does define. Duncan Murdoch So if this is a LaTeX bug, should we report it upstream to the maintainer (Lars Engebretsen, http://sites.engebretsen.ch/lars/ ) ? It is documented in the README for the ae package, so I don't think a new report is needed. Duncan Murdoch Defending Paul a little bit: from his point of view, Sweave is what broke things, and although the bug isn't in Sweave it would take a certain amount of digging to discover that ... From the ae-unaware user's point of view, adding Sweave is necessary and sufficient to produce the unwanted behavior. cheers Ben Bolker Maybe bug is not the right word. What do you call an undocumented effect in software that does not produce a pleasant result? :) The Sweave user manual makes no mention of the ae font. If you don't believe me, grab this: http://www.stat.uni-muenchen.de/~leisch/Sweave/Sweave-manual.pdf and search for ae. It does not mention the [noae] option. I never realized any ae fonts were used until this problem emerged. The ae readme http://www.ctan.org/get/fonts/ae/README doesn't have a warning about double quotes. The characters missing are mainly: eth, thorn, and the sami letter eng. Sometimes the £ character is also missing. For the typewriter fonts, the situation is worse. If it had said typewriter font is missing quotation marks it would have gotten my attention, but I would have had no reason to be reading the ae font README because there's nothing about ae in the Sweave manual. Claiming that I brought this on myself by using the ae font is just wrong. The claim that the ae font is widely known to be lacking in some symbols is interesting, however. If so, Sweave should not place \usepackage{ae} at the end of every preamble by default. Here is where I think the question lies. Given ae's weaknesses, what should Sweave do? Is there any reason why Sweave should prefer to insert the ae fonts where I had Latin Modern? I don't think so, if these authors are correct: http://www.dante.de/dante/events/eurotex/papers/TUT09.pdf If there is a good reason to use ae for the serif and nonserif fonts, we could protect the typewriter font by having Sweave insert this: \usepackage{ae,aecompl} \renewcommand{\ttdefault}{lmtt} \usepackage[T1]{fontenc} I think we could even do better by having a conditional check on availability of lmtt before using the package, and if it is not available use cm-super or just cm. I'd just like to set Sweave's default as close to good as possible so more political science professors don't have to go through this same hassle. For the time being, I'm protecting users by applying this patch to Sweave.sty, which makes [noae] the default situation. $ diff Sweave.sty Sweave.sty-new 8c8 \setboolean{swe...@ae}{true} --- \setboolean{swe...@ae}{false} I'll see what symbols now show up as big black boxes. Actually, those don't scare me so much as the ones that fail to appear at all. Almost impossible to proof read for that kind of trouble. (Ever accidentally put ~ in ordinary text in a latex document? It just disappears in the output.) pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PCA
On Wed, Mar 10, 2010 at 4:42 PM, Xanthe Walker xanthe.wal...@gmail.com wrote: Hello, I am trying to complete a PCA on a set of standardized ring widths from 8 different sites (T10, T9, T8, T7, T6, T5, T3, and T2). The following is a small portion of my data: T10 T9 T8 T7 T6 T5 T3 T2 1.33738 0.92669 0.91146 0.98922 0.9308 0.88201 0.92287 0.91775 0.82181 1.05319 0.92908 0.97971 0.95165 0.98029 1.14048 0.77803 0.88294 0.96413 0.90893 0.87957 0.9961 0.74926 0.71394 0.70877 1.07549 1.13311 1.23536 1.19382 1.2362 1.07741 1.20334 0.8727 0.77737 0.99292 0.92703 1.02384 0.99831 1.1317 0.93672 1.07909 0.88933 1.15587 1.20885 0.8983 1.06476 0.81845 1.09017 0.72909 0.75347 0.95826 0.90922 0.73869 0.74846 0.70481 0.49826 0.91824 1.39082 1.1281 1.05147 0.95839 1.20648 1.24587 0.65045 1.23251 0.80977 0.89714 0.90042 0.9543 0.86217 1.20818 0.82725 0.7666 1.11092 1.10328 1.16464 1.00707 1.09575 1.04647 0.79045 0.47331 0.88753 1.04699 1.0854 0.91803 1.03622 0.80624 0.905 1.28271 0.91963 0.90121 0.89136 0.97408 1.0449 1.00572 0.7703 1.48373 1.31837 0.97733 1.04229 1.23096 1.14002 1.09911 0.77523 1.31543 This is what I have done in R: rm(list=ls(all=TRUE)) PCA-read.csv(PCAsites.csv, header=TRUE) PCA attach(PCA) names(PCA) ###using correlation matrix# p1 - princomp(PCA, cor = TRUE) summary(p1) loadings(p1) plot(p1) biplot(p1) p1$scores screeplot(p1, npcs=4, type=lines) The problem is that the purpose of this analysis is to derive a new data set, using the first component, that I can work with. In other words, I am doing this to compress the data into one column of ring widths. How do I derive a new data set? the output of p1$scores is what you want, isn't it? Or just one column of it? desiredScores - p1$scores[ , 1] p -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Shade area under curve
I use this to make illustration for some calculus notes. There are examples of shaded areas in there: ### Filename: Normal1_2009_plotmathExample.R ### Paul Johnson June 3, 2009 ### This code should be available somewhere in http://pj.freefaculty.org/R. If it is not ### email me paulj...@ku.edu ###Set mu and sigma at your pleasure: mu - 10.03 sigma - 12.5786786 myx - seq( mu - 3.5*sigma, mu+ 3.5*sigma, length.out=500) myDensity - dnorm(myx,mean=mu,sd=sigma) # It is challenging to combine plotmath with values of mu and sigma in one expression. # Either bquote or substitute can be used. First use bquote: myTitle1 - bquote (paste(x ~ Normal(, mu== .(round(mu,2)), ',', sigma== .(round(sigma,2)),)) ) ### Using substitute: ### myTitle1 - substitute( x ~ Normal ~~ group( (, list(mu==mu1, sigma^2==sigma2)#, )) , list(mu1=round(mu,2), sigma2=round(sigma^2,2))) ### Or combine the two: ### t1 - substitute ( mu == a , list (a = mu)) ### t2 - substitute ( sigma == a, list(a = round(sigma,2))) ### myTitle1 - bquote(paste(x ~ Normal(, .(t1),,, .(t2),) ) ) ## To save the plot in a file, change FALSE to TRUE saveplot - FALSE if (saveplot == TRUE){ pdf(file=Normal-2009.pdf, height=6.5, width=6.5, onefile=F, paper=special, pointsize=10) }else { dev.new(height=6.5, width=6.5,pointsize=9) } ### xpd needed to allow writing outside strict box of graph par(xpd=TRUE, ps=10) plot(myx, myDensity, type=l, xlab=x, ylab=Probability Density , main=myTitle1, axes=FALSE) axis(2, pos= mu - 3.6*sigma) axis(1, pos=0) lines(c(myx[1],myx[length(myx)]),c(0,0)) ### closes off axes # bquote creates an expression that text plotters can use t1 - bquote( mu== .(mu)) # Find a value of myx that is very close to mu centerX - max(which (myx = mu)) # plot light vertical line under peak of density lines( c(mu, mu), c(0, myDensity[centerX]), lty= 14, lwd=.2) # label the mean in the bottom margin mtext(bquote( mu == .(mu)), 1, at=mu, line=-1) ### find position 20% up vertically, to use for arrow coordinate ss = 0.2 * max(myDensity) # Insert interval to represent width of one sigma arrows( x0=mu, y0= ss, x1=mu+sigma, y1=ss, code=3, angle=90, length=0.1) ### Write the value of sigma above that interval t2 - bquote( sigma== .(round(sigma,2))) text( mu+0.5*sigma, 1.15*ss, t2) ### Create a formula for the Normal normalFormula - expression (f(x) == frac (1, sigma* sqrt(2*pi)) * e^{~~ - ~~ frac(1,2)~~ bgroup((, frac(x-mu,sigma),))^2}) # Draw the Normal formula text ( mu + 0.5*sigma, max(myDensity)- 0.10 * max(myDensity), normalFormula, pos=4) ### Theory says we should have 2.5% of the area to the left of: -1.96 * sigma. ### Find the X coordinate of that critical value criticalValue - mu -1.96 * sigma ### Then grab all myx values that are to the left of that critical value. specialX - myx[myx = criticalValue] ### mark the critical value in the graph text ( criticalValue, 0 , label= paste(round(criticalValue,2)), pos=1) ### Take sequence parallel to values of myx inside critical region specialY - myDensity[myx criticalValue] # Polygon makes a nice shaded illustration polygon(c(specialX[1], specialX, specialX[length(specialX )]), c(0, specialY, 0), density=c(-110),col=lightgray ) shadedArea - round(pnorm(mu - 1.96 * sigma, mean=mu, sd=sigma),4) ### I want to insert annotation about area on left side. al1 - bquote(Prob(x = .(round(criticalValue,2 al2 - bquote(phantom(0) == F( .(round(criticalValue,2)) )) al3 - bquote(phantom(0) == .(round(shadedArea,3))) ### Hard to position this text just right ### Have tried many ideas, this may be least bad. ### Get center position in shaded area medX - median(specialX) ### density at that center point: denAtMedX - myDensity[indexMed - max(which(specialX medX))] text(medX, denAtMedX+0.0055, labels=al1) text(medX, denAtMedX+0.004, labels=al2) text(medX, denAtMedX+0.0025, labels=al3) ### point from text toward shaded area arrows( x0=medX, y0=myDensity[indexMed]+0.002 ,x1= mu-2.5 *sigma, y1= 0.40*myDensity[length(specialX)] , length=0.1) ss - 0.1 * max(myDensity) ### Mark interval from mu to mu-1.96*sigma arrows( x0=mu, y0= ss, x1=mu-1.96*sigma, y1=ss, code=3, angle=90, length=0.1) ### Put text above interval text( mu - 2.0*sigma,1.15*ss, bquote(paste(.(round(criticalValue,2)),phantom(1)==mu-1.96 * sigma,sep=)),pos=4 ) criticalValue - mu +1.96 * sigma ### Then grab all myx values that are to the left of that critical value. specialX - myx[myx = criticalValue] ### mark the critical value in the graph text ( criticalValue, 0 , label= paste(round(criticalValue,2)), pos=1) ### Take sequence parallel to values of myx inside critical region specialY - myDensity[myx criticalValue] # Polygon makes a nice shaded illustration polygon(c(specialX[1], specialX, specialX[length(specialX )]), c(0, specialY, 0), density=c(-110),col=lightgray ) shadedArea - round(pnorm(mu + 1.96 * sigma, mean=mu, sd=sigma, lower.tail=F),4) ### Insert simpler comment on right side. al2 - bquote(phantom
[R] Care to share an R presentation?
The R movement is picking up steam in the center of America. People that ignored my R-enthusiasm 10 years ago are now calling me up asking for presentations. I need to make a 2 hour presentation to a collection of faculty and grad students who might like to use R. I don't want to make it seem too complicated (as I often do), but I don't want to mislead them to think it will be easy. I expect other r-help readers have been in this same situation. I have a recollection (5, 6 years ago) that one of the R leaders had a slideshow for this exact purpose. But I can't find it now. There is a R-help similar request and both John Fox and Deepayan Sarkar offered links to their materials. However, the links aren't valid anymore. If I don't find a pre-existing example to work from, I'll slap together a Beamer/Sweave presentation and post it where future speech givers can get it. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Does S inherit the enhancements in R language?
I don't know anybody who has S-plus these days, but I expect some of you might, and perhaps you won't mind telling me something. I'm working on my presentation about R for the general audience. As I survey the suggestions from this list about that project, I find myself wondering whether S-plus benefits from R. Does S-plus syntax change like R's does. I can take code for S and run it through R, but if I took some R code to an S-plus system, would it work? Example 1. _, -, = The old S documents I find emphasize assigment with _. When I started with R, that was deprecated, then forbidden. _ was not allowed in file names, now it is. It was absolutely necessary to use -. = caused errors. Around the time of R-2.1 or so, it became possible to run R code that has = for assignments. It's not encouraged by R core team, but = is allowed. Does S+ now accept either - or = ? For that matter, in S+ is it now forbidden to use underscore for assignment, as it is in R? Example 2. Semicolons now discouraged in R code. We used to require ; to end commands. Now the R parser is smart enough to spot line breaks and interpret them accordingly. R's been that way for as long as I can remember, but I think the ; was required in earliest R releases. I rather liked the definitive ; at the end of every command. That looks right to me, probably because of my SAS and C background. Would S+ have a panic attack? Example 3. Namespace. Does S-plus get better as R does? Years ago, I was modeling US Democrats and Republicans and I created an indicator variable called rep. regression models would not work after that because the rep function had been blocked. It was very frustrating to me. Professor Ripley spotted the error and posed a message called Don't use the names of R functions as variable names http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg11585.html. After that, I was terrified that any name I used might conflict with a built in R function name. Last month, i saw a student here with a political model and he used rep as a variable in a regression model, it seemed to work just fine. I surmise that the rise in usage of namespaces in R packages accounts for that? I'm sorry if this is too OT for r-help. pj -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question re the plotrix package
Dear list, I am using the clock24.plot command in this excellent package to plot animal activity data. Does anyone know if both symbols and a line can be plotted on the same plot to show both raw data (symbols) and a line (describing a statistical model of the pattern) ? Or if more than one line etc can be plotted? Thanks Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fit.mult.impute() in Hmisc
On Thu, Mar 31, 2011 at 2:56 PM, Yuelin Li li...@mskcc.org wrote: I tried multiple imputation with aregImpute() and fit.mult.impute() in Hmisc 3.8-3 (June 2010) and R-2.12.1. The warning message below suggests that summary(f) of fit.mult.impute() would only use the last imputed data set. Thus, the whole imputation process is ignored. Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e. Hello. I fiddled around with rms multiple imputation when I was preparing these notes from our R summer course. I ran into that same thing you did, and my conclusion is slightly different from yours. http://pj.freefaculty.org/guides/Rcourse/multipleImputation/multipleImputation-1-lecture.pdf Look down to slide 80 or so, where I launch off into that question. It appears to me that aregImpute will give the right answer for fitters from rms, but if you want to feel confident about the results for other fitters, you should use mitools or some other paramater combining approach. My conclusion (slide 105) is Please note: the standard errors in the output based on lrm match the std.errors estimated by MItools. Thus I conclude sqrt(diag(cov(fit.mult.impute.object) did not give correct results But the standard errors in summary(f) agree with the values from sqrt(diag(vcov(f))) to the 4th decimal point. It would seem that summary(f) actually adjusts for multiple imputation? Does summary(f) in Hmisc 3.8-3 actually adjust for MI? If it does not adjust for MI, then how do I get the MI-adjusted coefficients and standard errors? I can't seem to find answers in the documentations, including rereading section 8.10 of the Harrell (2001) book Googling located a thread in R-help back in 2003, which seemed dated. Many thanks in advance for the help, Yuelin. http://idecide.mskcc.org --- library(Hmisc) Loading required package: survival Loading required package: splines data(kyphosis, package = rpart) kp - lapply(kyphosis, function(x) + { is.na(x) - sample(1:length(x), size = 10); x }) kp - data.frame(kp) kp$kyp - kp$Kyphosis == present set.seed(7) imp - aregImpute( ~ kyp + Age + Start + Number, dat = kp, n.impute = 10, + type = pmm, match = closest) Iteration 13 f - fit.mult.impute(kyp ~ Age + Start + Number, fitter=glm, xtrans=imp, + family = binomial, data = kp) Variance Inflation Factors Due to Imputation: (Intercept) Age Start Number 1.06 1.28 1.17 1.12 Rate of Missing Information: (Intercept) Age Start Number 0.06 0.22 0.14 0.10 d.f. for t-distribution for Tests of Single Coefficients: (Intercept) Age Start Number 2533.47 193.45 435.79 830.08 The following fit components were averaged over the 10 model fits: fitted.values linear.predictors Warning message: In fit.mult.impute(kyp ~ Age + Start + Number, fitter = glm, xtrans = imp, : Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e. f Call: fitter(formula = formula, family = binomial, data = completed.data) Coefficients: (Intercept) Age Start Number -3.6971 0.0118 -0.1979 0.6937 Degrees of Freedom: 80 Total (i.e. Null); 77 Residual Null Deviance: 80.5 Residual Deviance: 58 AIC: 66 sqrt(diag(vcov(f))) (Intercept) Age Start Number 1.5444782 0.0063984 0.0652068 0.2454408 -0.1979/0.0652068 [1] -3.0350 summary(f) Call: fitter(formula = formula, family = binomial, data = completed.data) Deviance Residuals: Min 1Q Median 3Q Max -1.240 -0.618 -0.288 -0.109 2.409 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -3.6971 1.5445 -2.39 0.0167 Age 0.0118 0.0064 1.85 0.0649 Start -0.1979 0.0652 -3.03 0.0024 Number 0.6937 0.2454 2.83 0.0047 (Dispersion parameter for binomial family taken to be 1) Null deviance: 80.508 on 80 degrees of freedom Residual deviance: 57.965 on 77 degrees of freedom AIC: 65.97 Number of Fisher Scoring iterations: 5 = Please note that this e-mail and any files transmitted with it may be privileged, confidential, and protected from disclosure under applicable law. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any reading, dissemination, distribution,
[R] seeking advice about rounding error and %%
A client came into our consulting center with some data that had been damaged by somebody who opened it in MS Excel. The columns were supposed to be integer valued, 0 through 5, but some of the values were mysteriously damaged. There were scores like 1.18329322 and such in there. Until he tracks down the original data and finds out what went wrong, he wants to take all fractional valued scores and convert to NA. As a quick hack, I suggest an approach using %% x - c(1,2,3,1.1,2.12131, 2.001) x %% 1 [1] 0.0 0.0 0.0 0.1 0.12131 0.00100 which(x %% 1 0) [1] 4 5 6 xbad - which(x %% 1 0) x[xbad] - NA x [1] 1 2 3 NA NA NA I worry about whether x %% 1 may ever return a non zero result for an integer because of rounding error. Is there a recommended approach? What about zapsmall on the left, but what on the right of ? which( zapsmall(x %% 1) 0 ) Thanks in advance -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphics: 3D regression plane
Hi. Comments below On Wed, Apr 27, 2011 at 2:32 AM, agent dunham crossp...@hotmail.com wrote: Hi, thanks, I think I've changed the previous as you told me but I'm having this error, what does it mean? model- lm(log(v1)~log(v2)+v3, data=dat) newax- expand.grid( v2 = seq(min(log(dat$v2)), max(log(dat$v2)), length=100), v3= seq(min(dat$v3), max(dat$v3), length=100)) fit - predict(model,newax) graph - regr2.plot(x=seq(min(log(dat$v2)), max(log(dat$v2)), length=100), y=seq(min(dat$v3), max(dat$v3), length=100), z= matrix(fit, 100, 100), main=Least-squares, resid.plot= TRUE, plot.base.points= TRUE, expand=0.5, ticktype=detailed, theta=-45) Error en cbind(x, y, z, 1) %*% pmat : argumentos no compatibles Thanks again, u...@host.com I've struggled with these 3d things before. You should supply us the full testable program to fix. Here I've fiddled around and can get a plot from persp or regr2.plot, but I understand it is not informative. But the plot does run, and maybe you will see how to fix. Generally, I would suggest you do the separate work in separate steps, not all jammed together inside regr2.plot. If you do each bit separately, then you can inspect the result and be sure it is good. See: ##PJ 2011-04-27 ## testing to address question in r-help on persp and regr2.plot v1 - rnorm(100,m=100,s=10) v2 - rnorm(100, m=1000, s=100) v3 - rnorm(100) dat - data.frame(v1,v2,v3) rm(v1,v2,v3) model- lm(log(v1)~log(v2)+v3, data=dat) rv2 - range(dat$v2) plv2 - seq(rv2[1], rv2[2], length.out=100) rv3 - range(dat$v3) plv3 - seq(rv3[1], rv3[2], length.out=100) newax- expand.grid( v2 = plv2, v3= plv3) fit - predict(model,newax) zmat - matrix(fit,100,100) persp(x=plv2, y=plv3, z=zmat) require(HH) graph - regr2.plot(x= plv2, y=plv3, z=zmat, main=Least-squares, resid.plot= TRUE, plot.base.points= TRUE, expand=0.5, ticktype=detailed, theta=-45) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Testing for arguments in a function
On Mon, Sep 26, 2011 at 2:39 PM, Gene Leynes gley...@gmail.com wrote: I don't understand how this function can subset by i when i is missing ## My function: myfun = function(vec, i){ ret = vec[i] ret } ## My data: i = 10 vec = 1:100 ## Expected input and behavior: myfun(vec, i) ## Missing an argument, but error is not caught! ## How is subsetting even possible here??? myfun(vec) Hello, Gene: It seems to me the discussion of your question launches off into a more abstract direction than we need here. I've found it is wise to name arguments to functions differently than variables in the environment, so you don't have the function looking for i outside itself. And you can put each variable to a ridiculous default to force an error when that option is not given. NAN is nothing here, just a string that has no meaning, so we get an error as you said you wanted. myfun - function(ivec, ii=NAN){ ivec[ii] } myfun(1:40,10) works myfun(1:40) Produces Error in myfun(1:40) : object 'NAN' not found If you are happy enough to just plop out an error, there's no need to worry. Note the function can be written more succinctly than you originally had it, and you are generally advised to use - rather than = by the R developers. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Testing for arguments in a function
Just for the record, following Bill Dunlap's advice, I think this is the best answer to the question as originally posed is. myfun - function(vec, i=stop('i' must be supplied)){ vec[i] } myfun(1:40,10) [1] 10 myfun(1:10) Error in myfun(1:10) : 'i' must be supplied -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running a GMM Estimation on dynamic Panel Model using plm-Package
On Sun, Jun 12, 2011 at 2:43 PM, bstudent marc.ruet...@gmx.de wrote: Hello, although I searched for a solution related to my problem I didn´t find one, yet. My skills in R aren´t very large, however. For my Diploma thesis I need to run a GMM estimation on a dynamic panel model using the pgmm - function in the plm-Package. The model I want to estimate is: Y(t) = Y(t-1) + X1(t) + X2(t) + X3(t) . There are no normal instruments in this model. There just should be the gmm-instruments I need for the model. In order to estimate it, I tried the following code: library(plm) test - pgmm(Y ~ lag(Y, 1) + X1 + X2 + X3 | lag(Y, 1), data=Model, effect=individual, model=onestep) I tried Model as Modelp - pdata.frame(... and as Model - read.table(... but in both cases there´s an error-massage: Error in solve.default(Reduce(+, A2)) : System ist für den Rechner singulär: reziproke Konditionszahl = 4.08048e-22 Hello, I have students working on similar problems. Here is what I would say to them: Without a dataset and code that is supposed to work, nobody can figure out what's wrong and help you around it. 2 suggestions 1. directly contact Yves Croissant, the plm principal author, and give him your R code and the data set. Show him the error output you get. Here's the contact information: Yves Croissant yves.croiss...@univ-reunion.fr If he answers, please let us know. If you don't want to (or can't) give real data, make some up that causes the same crash. 2. post in here a link to your data and the full code and I will try to debug it to at least find out where this is going wrong. I've been studying debugging with R functions and this is a good opportunity for me. I stopped focusing on panel estimator details in 2000, so I'm rusty, but will probably recognize most of what is going on. If you don't want to broadcast this to everybody, uou can feel free to contact me directly, pauljohn at ku.edu is my university address. PJ -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tinn-R
On Tue, Oct 4, 2011 at 1:25 PM, Charles McClure cmccl...@atrcorp.com wrote: I am new to R and have recently tried Tinn-R with very mixed and unexpected results. Can you point me to a Tinn-R tutorial on the web or a decent reference book? In my experience, TINN-R does not work so well, and most new users are recommended to try instead Notepad++ with the addon components R2notepad++ or Rstudio. I have MS Windows setup tips here http://web.ku.edu/~quant/cgi-bin/mw1/index.php?title=Windows:AdminTips Until I see evidence otherwise, I'm concluding that TINN-R was the best in 2008, but it is harder to configure now and there's no reason to prefer it over Notepad++. Real Men[tm] still use Emacs, but new users may not have enough muscles :) pj Thank you for your help; Charles McClure -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multi-platform equivalent of x11() ?
I want to write an R help example that throws up 2 graphs in separate windows, for comparison. In Linux I plot one, then run x11() to spawn a new on-screen device. Is there some generic equivalent so I can write an example that will work for Windows and Mac users as well? If there is none, don't you think it would be fun if there were? pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simulate an gradually increase of the number of subjects based on two variables
Suggestion below: On Tue, Mar 13, 2012 at 1:24 PM, guillaume chaumet guillaumechau...@gmail.com wrote: I omit to precise that I already try to generate data based on the mean and sd of two variables. x=rnorm(20,1,5)+1:20 y=rnorm(20,1,7)+41:60 simu-function(x,y,n) { simu=vector(list,length=n) for(i in 1:n) { x=c(x,rnorm(1,mean(x),sd(x))) y=c(y,rnorm(1,mean(y),sd(y))) simu[[i]]$x-x simu[[i]]$y-y } return(simu) } test=simu(x,y,60) lapply(test, function(x) cor.test(x$x,x$y)) As you could see, the correlation is disappearing with increasing N. Perhaps, a bootstrap with lm or cor.test could solve my problem. In this case, you should consider creating the LARGEST sample first, and then remove cases to create the smaller samples. The problem now is that you are drawing a completely fresh sample every time, so you are getting not only the effect of sample size, but also that extra randomness when case 1 is replaced every time. I am fairly confident (80%) that if you approach it my way, the mystery you see will start to clarify itself. That is, draw the big sample with the desired characteristic, and once you understand the sampling distribution of cor for that big sample, you will also understand what happens when each large sample is reduced by a few cases. BTW, if you were doing this on a truly massive scale, my way would run much faster. Allocate memory once, then don't need to manually delete lines, just trim down the index on the rows. (Same data access concept as bootstrap). pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] beginner's loop issue
On Tue, Mar 13, 2012 at 11:27 AM, aledanda danda.ga...@gmail.com wrote: Dear All, I hope you don't mind helping me with this small issue. I haven't been using R in years and I'm trying to fill in a matrix with the output of a function (I'm probably using the Matlab logic here and it's not working). Here is my code: for (i in 1:length(input)){ out[i,1:3] - MyFunction(input[i,1],input[i,2], input[i,3]) out[i,4:6] - MyFunction(input[i,5],input[i,7], input[i,6]) out[i,7:9] - MyFunction(input[i,8],input[i,10], input[i,9]) } 'input' is a matrix dim(input) [1] 46 10 and each raw corresponds to a different subject. The error I get here is /Error in out[i, 1:3] - get.vaTer(input[i, 2], input[i, 4], input[i, 3], : object 'out' not found/ out has to exist first, as previous commenter said. Furthermore, suggestions: Consider making MyFunction accept a vector of 3 arguments, rather than separate arguments. Consider making out 3 columns, as in out - matrix(0, nrow=N, ncol=3) for(i ...){ out[i,1:3] - MyFunction(input[i,1:3]) out[i,1:3] - MyFunction(input[i,4:6]) out[i,1:3] - MyFunction(input[i,7:9]) } If you could re-shape your input thing as a list with one element that needs to go into MyFunction, this could get easier still: lapply(input, MyFunction) or if input were an array with 3 columns, you could revise MyFuntion to accept a 3-vector. apply(input, 1, MyFunction) Hardly ever in R does one need to specify inputs as you have done in your example. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Models with ordered and unordered factors
On Tue, Nov 15, 2011 at 9:00 AM, Catarina Miranda catarina.mira...@gmail.com wrote: Hello; I am having a problems with the interpretation of models using ordered or unordered predictors. I am running models in lmer but I will try to give a simplified example data set using lm. Both in the example and in my real data set I use a predictor variable referring to 3 consecutive days of an experiment. It is a factor, and I thought it would be more correct to consider it ordered. Below is my example code with my comments/ideas along it. Can someone help me to understand what is happening? Dear Catarina: I have had the same question, and I hope my answers help you understand what's going on. The short version: http://pj.freefaculty.org/R/WorkingExamples/orderedFactor-01.R The longer version, Working with Ordinal Predictors http://pj.freefaculty.org/ResearchPapers/MidWest09/Midwest09.pdf HTH pj Thanks a lot in advance; Catarina Miranda y-c(72,25,24,2,18,38,62,30,78,34,67,21,97,79,64,53,27,81) Day-c(rep(Day 1,6),rep(Day 2,6),rep(Day 3,6)) dataf-data.frame(y,Day) str(dataf) #Day is not ordered #'data.frame': 18 obs. of 2 variables: # $ y : num 72 25 24 2 18 38 62 30 78 34 ... # $ Day: Factor w/ 3 levels Day 1,Day 2,..: 1 1 1 1 1 1 2 2 2 2 ... summary(lm(y~Day,data=dataf)) #Day 2 is not significantly different from Day 1, but Day 3 is. # #Call: #lm(formula = y ~ Day, data = dataf) # #Residuals: # Min 1Q Median 3Q Max #-39.833 -14.458 -3.833 13.958 42.167 # #Coefficients: # Estimate Std. Error t value Pr(|t|) #(Intercept) 29.833 9.755 3.058 0.00797 ** #DayDay 2 18.833 13.796 1.365 0.19234 #DayDay 3 37.000 13.796 2.682 0.01707 * #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 23.9 on 15 degrees of freedom #Multiple R-squared: 0.3241, Adjusted R-squared: 0.234 #F-statistic: 3.597 on 2 and 15 DF, p-value: 0.05297 # dataf$Day-ordered(dataf$Day) str(dataf) # Day 1Day 2Day 3 #'data.frame': 18 obs. of 2 variables: # $ y : num 72 25 24 2 18 38 62 30 78 34 ... # $ Day: Ord.factor w/ 3 levels Day 1Day 2..: 1 1 1 1 1 1 2 2 2 2 ... summary(lm(y~Day,data=dataf)) #Significances reversed (or Day.L and Day.Q are not sinonimous Day 2 and Day 3?): Day 2 (.L) is significantly different from Day 1, but Day 3 (.Q) isn't. #Call: #lm(formula = y ~ Day, data = dataf) # #Residuals: # Min 1Q Median 3Q Max #-39.833 -14.458 -3.833 13.958 42.167 # #Coefficients: # Estimate Std. Error t value Pr(|t|) #(Intercept) 48. 5.6322 8.601 3.49e-07 *** #Day.L 26.1630 9.7553 2.682 0.0171 * #Day.Q -0.2722 9.7553 -0.028 0.9781 #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 23.9 on 15 degrees of freedom #Multiple R-squared: 0.3241, Adjusted R-squared: 0.234 #F-statistic: 3.597 on 2 and 15 DF, p-value: 0.05297 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how big (in RAM and/or disk storage) is each of these objects in a list?
Greetings, friends (and others :) ) We generated a bunch of results and saved them in an RData file. We can open, use, all is well, except that the size of the saved file is quite a bit larger than we expected. I suspect there's something floating about in there that one of the packages we are using puts in, such as a spare copy of a data frame that is saved in some subtle way that has escaped my attention. Consider a list of objects. Are there ways to do these things: 1. ask R how much memory is used by the things inside the list? 2. Does as.expression(anObject) print everything in there? Or, is there a better way to convert each thing to text or some other format that you can actually read line by line to see what is in there, to see everything? If there's no giant hidden data frame floating about, I figure I'll have to convert symmetric matrices to lower triangles or such to save space. Unless R already is automatically saving a matrix in that way but just showing me the full matrix, which I suppose is possible. If you have other ideas about general ways to make saved objects smaller, I'm open for suggestions. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fundamental guide to use of numerical optimizers?
I was in a presentation of optimizations fitted with both MPlus and SAS yesterday. In a batch of 1000 bootstrap samples, between 300 and 400 of the estimations did not converge. The authors spoke as if this were the ordinary cost of doing business, and pointed to some publications in which the nonconvergence rate was as high or higher. I just don't believe that's right, and if some problem is posed so that the estimate is not obtained in such a large sample of applications, it either means the problem is badly asked or badly answered. But I've got no traction unless I can actually do better Perhaps I can use this opportunity to learn about R functions like optim, or perhaps maxLik. From reading r-help, it seems to me there are some basic tips for optimization, such as: 1. It is wise to scale the data so that all columns have the same range before running an optimizer. 2. With estimates of variance parameters, don't try to estimate sigma directly, instead estimate log(sigma) because that puts the domain of the solution onto the real number line. 3 With estimates of proportions, estimate instead the logit, for the same reason. Are these mistaken generalizations? Are there other tips that everybody ought to know? I understand this is a vague question, perhaps the answers are just in the folklore. But if somebody has written them out, I would be glad to know. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pls help to print out first row of terms(model) output in example program
Greetings. I've written a convenience function for multicollinearity diagnosis. I'd like to report to the user the formula that is used in a regression. I get output like this: mcDiagnose(m1) [1] The following auxiliary models are being estimated and returned in a list: [1] `x1` ~ . formula(fmla)() [1] `x2` ~ . I'd like to fill in the period with the variable names that are in there. I know the information is in there, I just need to get it. The terms output for a fitted model has the output at the very top, in the first line, above the attributes. I just want to print out that first line, here: terms(m2) y ~ log(10 + x1) + poly(x2, 2) attr(,variables) list(y, log(10 + x1), poly(x2, 2)) attr(,factors) log(10 + x1) poly(x2, 2) y 0 0 log(10 + x1)1 0 poly(x2, 2) 0 1 attr(,term.labels) [1] log(10 + x1) poly(x2, 2) [snip] In my working example code below , I need the help where I have ##fix me fix me## ##Paul Johnson ## 2011-12-19 ## mcDiagnose.R lmAuxiliary - function(model){ dat - as.data.frame(model.matrix(model)) ## ivnames - attr(delete.response(terms(model)), term.labels) ## previous does not work with transforms like poly hasIntercept - attr(terms(model), intercept) if (hasIntercept) dat - dat[ , -1] # removes intercept. assumes intercept in column 1 ivnames - colnames(dat) print(The following auxiliary models are being estimated and returned in a list:) results - list() for (i in ivnames) { fmla - paste( `, i, `, ~ . , sep=); print(fmla) maux - lm( formula(fmla), data=dat) results[[ i ]] - maux print(maux$call[2]) ###fix me fix me ## } results } mcDiagnose - function(model){ auxRegs - lmAuxiliary(model) auxRsq - numeric(length=length(auxRegs)) j - 0 for ( i in auxRegs ){ j - j + 1 sumry - summary(i) auxRsq[j] - sumry$r.squared } print(Drum roll please!) print(And your R_j Squareds are (auxiliary Rsq)) print(names(auxRegs)) print(auxRsq) } x1 - rnorm(100) x2 - rnorm(100) y - rnorm(100) m1 - lm(y~x1+x2) mcDiagnose(m1) m2 - lm(y~log(10+x1) + poly(x2,2)) mcDiagnose(m2) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 3d plotting alternatives. I like persp, but regret the lack of plotmath.
I have been making simple functions to display regressions in a new package called rockchalk. For 3d illustrations, my functions use persp, and I've grown to like working with it. As an example of the kind of things I like to do, you might consult my lecture on multicollinearity, which is by far the most detailed illustration I've prepared. http://pj.freefaculty.org/guides/stat/Regression/Multicollinearity/Multicollinearity-1-lecture.pdf I used persp mainly because I can understand it, and it can be made to work like plot in R, with additional tools like lines and points and such. I don't want to interact with these plots, I just need to put them into lectures documents relatively easily. And I've also succeeded in turning them into flash video with R2SWF, which works great! Last summer, I put together some lecture notes illustrating persp, scatterplot3d, and persp3d. http://pj.freefaculty.org/guides/Rcourse/plot-3d/plots-3d.pdf. As I review that now, I see I did not make any progress on the lattice based plotters, or I did not write it down, anyway. scatterplot3d did almost everything I needed to do, but not everything, and so I used persp in my newer effort. Then I put some plot math in an axis label and ran into the problem that everybody else who uses persp finds, eventually. persp doesn't allow expressions in axis labels. From searching in r-help, I see that many people have run up against the same trouble. Most people say too bad, I'll switch to some other tool. Suggested alternatives. 1. Use wireframe or cloud in lattice. They can handle plotmath. I've been studying that, and it can handle plot math, but I just can't get the kinds of graphs I want from it. In the persp framework, I can draw the 3d plot, and then add details to it one by one. I can comprehend the little steps. In wireframe and cloud, it *appears* to me i have to pack all of that work into one single command, and it is, well, either too difficult or impossible. Or perhaps I'm just not understanding the documentation. If I could make the sorts of plots I need with lattice tools, I would do it. But I'm really blocked at the front door by the write one giant function call nature of it. I realize that's vague because I have not told you specifically what I want to do. If there is a lattice expert reading, can you give me some HOWTO hints? Take, for example, Slide 19 in this one: http://pj.freefaculty.org/guides/stat/Regression/Multicollinearity/Multicollinearity-1-lecture.pdf gray dots in the x1-x2 plane, blue points hovering above, red pointed arrows from gray to blue. And then later on, say slide 36-60, I have regression planes drawn in with the arrows from the plane to the points. 2. Use rgl based tools. These ones are especially neat if you want to interact with the graph--spin a 3d graph or get a display with lively colors. It works more like plot, in the sense that you can draw the figure and then add components with points3d and so forth. And there are nice working examples of extensions in misc3d. And also, the R CMDR interface has a working pull down menu system that can make rgl 3d graphs. That's a big plus. After playing with some examples, I see these troubles. The only output device that runs well (finishes and does not generate massive files) is png. The output quality on the screen is quite beautiful, but transition to black-and-white is not as nice looking as persp, in my opinion. These graphs draw much more slowly. They are more difficult to script out in an Sweave document, it seems to me. If those criticisms are wrong, I would be glad to know. So I'm back wondering why persp can't be updated. Nobody has explained why it is not possible to revise persp to allow expressions in axis labels. Perhaps nobody has done it because people think that persp has no fans :) But I'm a fan. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to properly re-set a saved seed? I've got the answer, but no explanation
Hello, happy new year. I've come back to a problem from last spring. I need to understand what what is the technically correct method to save a seed and then re-set it. Although this example arises in the context of MT19937, I'm needing to do this with other generators as well, including L'Ecuyer streams. The puzzle is this comment in ?Random: ‘set.seed’ is the recommended way to specify seeds. What I did not understand before, and can't make sense of now, is that set.seed does not re-set a saved seed. Here's my working example: ## Paul Johnson ## April 20, 2011 ## If you've never dug into the R random number generator and its use of the ## default MT19937, this might be informative. It will also help you ## avoid a time-consuming mistake that I've made recently. ## I've been developing a simulation and I want to save and restore random ## streams exactly. I got that idea from John Chambers Software for ## Data Analysis and I studied his functions to see how he did it. ## The problem, as we see below, is that set.seed() doesn't do what I expected, ## and I feel lucky to have noticed it now, rather than later. ## I wish set.seed did work the way I want, though, and that's why I'm ## writing here, to see if anybody else wishes the same. ## Here's a puzzle. Try this: set.seed(444) s1 - .Random.seed runif(1) rnorm(1) set.seed(444) runif(1) rnorm(1) ## Those matched. Good ## Re-insert the saved seed s1 set.seed(s1) runif(1) rnorm(1) ## Why don't the draws match? I put back the seed, didn't I? ## Hm. Try again set.seed(s1) runif(1) rnorm(1) ## Why did those match? But neither matches cases 1 and 2. ## I was baffled discouraged. ## The help page for random numbers says: ## ‘set.seed’ is the recommended way to specify seeds. ## But, it doesn't say something like # set.seed(s1) ## is supposed to work. Ah. My mistake. Here's the fix and the puzzle ### ## To re-set the generator to its position at s1, it is required ## to do what the help page seems to say we should not. .Random.seed - s1 runif(1) # -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fix and edit don't work: unable to open X Input Method-segfault
I can't run fix() or edit() anymore. Did I break my system? I'm running Debian Linux with R-2.14.1. As far as I can tell, the R packages came from Debian's testing wheezy repository. I would like to know if users on other types of systems see the same problem. If no, then, obviously, it is a Debian-only issue and I can approach it from that point of view. And if no other Debian users see same, it means it is a me-only problem, and that's discouraging :) I get this same R crash whether I try fix when R is running in a terminal or in Emacs with ESS. I've not seen this before, but Google leads to some bug reports on Ubuntu in 2007, where it was claimed that the problem was fixed. The really bad part is that the second try causes a segmentation fault in R itself. library(ggplot2) Loading required package: reshape Loading required package: plyr Attaching package: ‘reshape’ The following object(s) are masked from ‘package:plyr’: rename, round_any Loading required package: grid Loading required package: proto sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.6 fix(mpg) Error in dataentry(datalist, modes) : invalid device In addition: Warning message: In edit.data.frame(get(subx, envir = parent), title = subx, ...) : unable to open X Input Method fix(mpg) *** caught segfault *** address (nil), cause 'unknown' Traceback: 1: edit.data.frame(get(subx, envir = parent), title = subx, ...) 2: edit(get(subx, envir = parent), title = subx, ...) 3: fix(mpg) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Selection: Same happens no matter what packages are loaded, so far as I can tell. Here it is without ggplot2, in case you were suspicious of those particular datasets. library(datasets) datasets() Error: could not find function datasets help(package=datasets) fix(CO2) Error in dataentry(datalist, modes) : invalid device In addition: Warning message: In edit.data.frame(get(subx, envir = parent), title = subx, ...) : unable to open X Input Method -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question on model.matrix
Greetings On Sat, Jan 28, 2012 at 2:43 PM, Daniel Negusse daniel.negu...@my.mcphs.edu wrote: while reading some tutorials, i came across this and i am stuck. i want to understand it and would appreciate if anyone can tell me. design - model.matrix(~ -1+factor(c(1,1,2,2,3,3))) can someone break down this code and explain to me what the ~, and the -1+factor are doing? A formula would be y ~ x, so when you don't include y, it means you only want the right hand side variables. The term design matrix generally means the numeric coding that is fitted in a statistical procedure. The -1 in the formula means do not insert an intercept for me. It affects the way the factor variable is converted to numeric contrasts in the design matrix. If there is an intercept, then the contrasts have to be adjusted to prevent perfect multicollinearity. If you run a few examples, you will see. This uses lm, but the formula and design matrix ideas are same. Note, with an intercept, I get 3 dummy variables from x2, but with no intercept, I get 4 dummies: x1 - rnorm(16) x2 - gl(4, 4, labels=c(none,some,more,lots)) y - rnorm(16) m1 - lm(y ~ x1 + x2) model.matrix(m1) (Intercept) x1 x2some x2more x2lots 11 -0.2567 0 0 0 21 0.94963659 0 0 0 31 0.06915561 0 0 0 41 0.89971204 0 0 0 51 0.73817482 1 0 0 61 2.92451195 1 0 0 71 -0.80682449 1 0 0 81 1.07472998 1 0 0 91 1.34949123 0 1 0 10 1 -0.42203984 0 1 0 11 1 -1.66316740 0 1 0 12 1 -2.83232063 0 1 0 13 1 1.26177313 0 0 1 14 1 0.10359857 0 0 1 15 1 -1.85671242 0 0 1 16 1 -0.25140729 0 0 1 attr(,assign) [1] 0 1 2 2 2 attr(,contrasts) attr(,contrasts)$x2 [1] contr.treatment m2 - lm(y ~ -1 + x1 + x2) model.matrix(m2) x1 x2none x2some x2more x2lots 1 -0.2567 1 0 0 0 2 0.94963659 1 0 0 0 3 0.06915561 1 0 0 0 4 0.89971204 1 0 0 0 5 0.73817482 0 1 0 0 6 2.92451195 0 1 0 0 7 -0.80682449 0 1 0 0 8 1.07472998 0 1 0 0 9 1.34949123 0 0 1 0 10 -0.42203984 0 0 1 0 11 -1.66316740 0 0 1 0 12 -2.83232063 0 0 1 0 13 1.26177313 0 0 0 1 14 0.10359857 0 0 0 1 15 -1.85671242 0 0 0 1 16 -0.25140729 0 0 0 1 attr(,assign) [1] 1 2 2 2 2 attr(,contrasts) attr(,contrasts)$x2 [1] contr.treatment I think you'll need to mess about with R basics like plot and lm before you go off using the formulas that you really care about. Otherwise, well, you'll always be lost about stuff like ~ and -1. I've started posting all my lecture notes (source code, R code, pdf output) http://pj.freefaculty.org/guides. That might be a quick start for you. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] replacing characters in matrix. substitute, delayedAssign, huh?
A user question today has me stumped. Can you advise me, please? User wants a matrix that has some numbers, some variables, possibly even some function names. So that has to be a character matrix. Consider: BM - matrix(0.1, 5, 5) Use data.entry(BM) or similar to set some to more abstract values. BM[3,1] - a BM[4,2] - b BM[5,2] - b BM[5,3] - d BM var1 var2 var3 var4 var5 [1,] 0.1 0.1 0.1 0.1 0.1 [2,] 0.1 0.1 0.1 0.1 0.1 [3,] a 0.1 0.1 0.1 0.1 [4,] 0.1 b 0.1 0.1 0.1 [5,] 0.1 b d 0.1 0.1 Later on, user code will set values, e.g., a - rnorm(1) b - 17 d - 4 Now, push those into BM, convert whole thing to numeric newBM - apply(BM, c(1,2), as.numeric) and use newBM for some big calculation. Then re-set new values for a, b, d, do the same over again. I've been trying lots of variations on parse, substitute, and eval. The most interesting function I learned about this morning was delayedAssign. If I had only to work with one scalar, it does what I want delayedAssign(a, whatA) whatA - 91 a [1] 91 I can't see how to make that work in the matrix context, though. Got ideas? pj sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.14.1 -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing characters in matrix. substitute, delayedAssign, huh?
Henrik's proposal works well, so far. Thanks very much. I could not have figured that out (without much more suffering). Here's the working example in case future googlers find their way to this thread. ## Paul Johnson paulj...@ku.edu ## 2012-01-30 ## Special thanks to r-help email list contributors, ## especially Henrik Bengtsson BM - matrix(0.1, 5, 5) BM[2,1] - a BM[3,2] - b BM parseAndEval - function(x, ...) eval(parse(text=x)) a - 0.5 b - 0.4 realBM - apply(BM, MARGIN=c(1,2), FUN=parseAndEval) BM[4,5] - rnorm(1, m=7, sd=1) BM realBM - apply(BM, MARGIN=c(1,2), FUN=parseAndEval) realBM ## Now, what about gui interaction with that table? ## The best nice looking options are not practical at the moment. ## Try this instead data.entry(BM) ## That will work on all platforms, so far as I know, without ## any special effort from us. Run that, make some changes, then ## make sure you insert new R variables to match in your environment. ## Suppose you inserted the letter z in there somewhere ## set z out here z - rpois(1, lambda=10) realBM - apply(BM, MARGIN=c(1,2), FUN=parseAndEval) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trouble automating formula edits when log or * are present; update trouble
Greetings I want to take a fitted regression and replace all uses of a variable in a formula. For example, I'd like to take m1 - lm(y ~ x1, data=dat) and replace x1 with something else, say x1c, so the formula would become m1 - lm(y ~ x1c, data=dat) I have working code to finish that part of the problem, but it fails when the formula is more complicated. If the formula has log(x1) or x1:x2, the update code I'm testing doesn't get right. Here's the test code: ##PJ ## 2012-05-29 dat - data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50), x3=rnorm(100,m=50), y=rnorm(100)) m1 - lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat) m2 - lm(y ~ log(x1) + x2*x3, data=dat) suffixX - function(fmla, x, s){ upform - as.formula(paste0(. ~ ., -, x, +, paste0(x, s))) update.formula(fmla, upform) } newFmla - formula(m2) newFmla suffixX(newFmla, x2, c) suffixX(newFmla, x1, c) The last few lines of the output. See how the update misses x1 inside log(x1) or in the interaction? newFmla - formula(m2) newFmla y ~ log(x1) + x2 * x3 suffixX(newFmla, x2, c) y ~ log(x1) + x3 + x2c + x2:x3 suffixX(newFmla, x1, c) y ~ log(x1) + x2 + x3 + x1c + x2:x3 It gets the target if the target is all by itself, but not otherwise. After messing with this for quite a while, I conclude that update was the wrong way to go because it is geared to replacement of individual bits, not editing all instances of a thing. So I started studying the structure of formula objects. I noticed this really interesting thing. the newFmla object can be probed recursively to eventually reveal all of the individual pieces: newFmla y ~ log(x1) + x2 * x3 newFmla[[3]] log(x1) + x2 * x3 newFmla[[3]][[2]] log(x1) newFmla[[3]][[2]][[2]] x1 So, if you could tell me of a general way to walk though a formula object, couldn't I use gsub or something like that to recognize each instance of x1 and replace with x1c?? I just can't figure how to automate the checking of each possible element in a formula, to get the right combination of [[]][[]][[]]. See what I mean? I need to avoid this: newFmla[[3]][[2]][[3]] Error in newFmla[[3]][[2]][[3]] : subscript out of bounds pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trouble automating formula edits when log or * are present; update trouble
Try substitute: do.call(substitute, list(newFmla, setNames(list(as.name(x1c)), x1))) y ~ log(x1c) + x2 * x3 Damn. That's pretty. I'd say setNames a magic bullet too. Thanks very much. The approach suggested by Michael and Bert has the little shortcoming that grepping for x1 also picks up similarly named variables like x1new and x1old. I've not yet found a similar downside with Gabor's method. pj -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] memory usage benefit from anonymous variable constructions.
This is an I was just wondering question. When the package dataframe was announced, the author claimed to reduce the number of times a data frame was copied, I started to wonder if I should care about this in my projects. Has anybody written a general guide for how to write R code that doesn't needlessly exhaust RAM? In Objective-C, we used to gain some considerable advantages by avoiding declaring objects separately, using anonymous variable instead. The storage was allocated on the stack, I think, and I think there was talk that the numbers might stay 'closer' to the CPU (register?) for immediate usage. Does this benefit in R as well? For example, instead of the way I would usually do this: mf - model.frame(model) y - model.response(mf) Here is the anonymous alternative, mf is never declared y - model.response(model.frame(model)) On the face of it, I can imagine this might be better because no permanent thing mf is created, the garbage collector wouldn't be called into play if all the data is local and disappears immediately. But, then again, R is doing lots of stuff under the hood that I've never bothered to learn about. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Partial R-square in multiple linear regression
On Fri, Jun 1, 2012 at 2:05 PM, Jin Choi oohps...@gmail.com wrote: Hello, I am trying to obtain the partial r-square values (r^2 or R2) for individual predictors of an outcome variable in multiple linear regression. I am using the 'lm' function to calculate the beta coefficients, however, I would like to know the individual % contributions of several indepenent variables. I tried searching for this function in many R packages, but it has proven elusive to me. I am an R beginner, and I am hoping that I will find a solution! Thank you very much. Sincerely, This is the kind of practical user request I've been trying to answer in the rockchalk package. I have a function called getDeltaRsquare. It takes a fitted regression and calculates the change in Rsquare when each variable is removed. It does not achieve the Partition idea for Rsquare that you might want because there is no logically meaningful way to partition Rsquare among predictors if there is any multicollinearity. I could point you at some stat books that try to achieve that partition, mostly they seem to be by psychologists (who like that kind of thing.) You will go down a rabbit hole of semi-partial correlation coefficients and so forth. But if you just want the how much does R-square drop if I leave out this variable, I got that for you :) Whether that is meaningful to you, well, that's a bigger methodological question. pj Jin Choi MSc Epidemiology student McGill University, Montreal, Quebec, CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] non ascill characters in plots. no alternative but plotmath?
A student entered some data with text characters like epsilon and alpha. On her Windows system, the Greek letters did not display properly in a plot. There were some ordinary ASCII instead. I asked her to send me the code so I could test. For me, the plot looks ok on the screen. Format1 - c(320,500,700,1000,500,320,700,500,320) Format2 - c(800,1000,1150,1400,1500,1650,1800,2300,2500) Vowel - c(u,o, α, a,ø, y, ε, e,i) V1 - data.frame(Format1,Format2,Vowel) plot(Format1 ~ Format2, data = V1, type=n) text(V1$Format2, V1$Format1, labels=V1$Vowel) On my Debian linux system, the plot shows the Greek letters just fine in the screen device. However, I turned on a pdf device to run the same code and see signs of trouble. text(V1$Format2, V1$Format1, labels=V1$Vowel) Warning messages: 1: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) : conversion failure on 'α' in 'mbcsToSbcs': dot substituted for ce 2: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) : conversion failure on 'α' in 'mbcsToSbcs': dot substituted for b1 3: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) : font metrics unknown for Unicode character U+03b1 4: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) : conversion failure on 'α' in 'mbcsToSbcs': dot substituted for ce 5: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) : conversion failure on 'α' in 'mbcsToSbcs': dot substituted for b1 6: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) : conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for ce 7: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) : conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for b5 8: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) : font metrics unknown for Unicode character U+03b5 9: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) : conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for ce 10: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) : conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for b5 The alpha and epsilon characters don't appear in the pdf. I don't know the proper terminology to describe the situation, thus I don't know where to start reading. Until very recently, I didn't even know it was possible to directly enter these characters in Emacs, but I've learned that part. I understand you might answer use plotmath, if if that's the only workable thing, I will teach her how. But that's a little bit of an up hill climb (from where we are now standing). It will be a lot more work for me to teach about expressions and whatnot, so if there is a direct route from a column of non ASCII characters to a plot that has those characters in it, I'd be glad to know. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] EM algorithm to find MLE of coeff in mixed effects model
On Tue, Jul 3, 2012 at 12:41 PM, jimmycloud jimmycl...@gmail.com wrote: I have a general question about coefficients estimation of the mixed model. I have 2 ideas for you. 1. Fit with lme4 package, using the lmer function. That's what it is for. 2. If you really want to write your own EM algorithm, I don't feel sure that very many R and EM experts are going to want to read through the code you have because you don't follow some of the minimal readability guidelines. I accept the fact that there is no officially mandated R style guide, except for indent with 4 spaces, not tabs. But there are some almost universally accepted basics like 1. Use -, not =, for assignment 2. put a space before and after symbols like -, = , + , * , and so forth. 3. I'd suggest you get used to putting squiggly braces in the KR style. I have found the formatR package's tool tidy.source can do this nicely. From tidy.source, here's what I get with your code http://pj.freefaculty.org/R/em2.R Much more readable! (I inserted library(MASS) for you at the top, otherwise this doesn't even survive the syntax check.) When I copy from email to Emacs, some line-breaking monkey-business occurs, but I expect you get the idea. Now, looking at your cleaned up code, I can see some things to tidy up to improve the chances that some expert will look. First, R functions don't require return at the end, many experts consider it distracting. (Run lm or such and review how they write. No return at the end of functions). Second, about that big for loop at the top, the one that goes from m 1:300 I don't know what that loop is doing, but there's some code in it that I'm suspicious of. For example, this part: W.hat - psi.old - psi.old %*% t(Zi) %*% solve(Sig.hat) %*% Zi %*% psi.old Sigb - psi.old - psi.old %*% t(Zi) %*% solve(Sig.hat) %*% Zi %*% psi.old First, you've caused the possibly slow calculation of solve (Sig.hat) to run two times. If you really need it, run it once and save the result. Second, a for loop is not necessarily slow, but it may be easier to read if you re-write this: for (j in 1:n) { tempaa[j]=Eh4new(datai=data.list[[j]],weights.m=weights.mat,beta=beta.old) tempbb[j] - Eh4newv2(datai = data.list[[j]], weights.m = weights.mat, beta = beta.old) } like this: tempaa - lapply(data.list, Eh4new, weights.m, beta.old) tempbb - lapply(data.list, Eh4newv2, weights.m, beta.old) Third, here is a no-no tempbb - c() for (j in 1:n) { tempbb - cbind(tempbb, Eh2new(data.list[[j]], weights.m = weights.mat)) } That will call cbind over and over, causing a relatively slow memory re-allocation. See (http://pj.freefaculty.org/R/WorkingExamples/stackListItems.R) Instead, do this to apply Eh2new to each item in data.list tempbbtemp - lapply(data.list, Eh2new, weights.mat) and then smash the results together in one go tempbb - do.call(cbind, tempbbtemp) Fourth, I'm not sure on the matrix algebra. Are you sure you need solve to get the full inverse of Sig.hat? Once you start checking into how estimates are calculated in R, you find that the paper-and-pencil matrix algebra style of formula is generally frowned upon. OLS (in lm.fit) doesn't do solve(X'X), it uses a QR decomposition of matrices. OR look in MASS package ridge regression code, where the SVD is used to get the inverse. I wish I knew enough about the EM algorithm to gaze at your code and say aha, error in line 332, but I'm not. But if you clean up the presentation and tighten up the most obvious things, you improve your chances that somebody who is an expert will exert him/herself to do it. pj b follows N(0,\psi) #i.e. bivariate normal where b is the latent variable, Z and X are ni*2 design matrices, sigma is the error variance, Y are longitudinal data, i.e. there are ni measurements for object i. Parameters are \beta, \sigma, \psi; call them \theta. I wrote a regular EM, the M step is to maximize the log(f(Y,b;\theta)) by taking first order derivatives, setting to 0 and solving the equation. The E step involves the evaluation of E step, using Gauss Hermite approximation. (10 points) All are simulated data. X and Z are naive like cbind(rep(1,m),1:m) After 200 iterations, the estimated \beta converges to the true value while \sigma and \psi do not. Even after one iteration, the \sigma goes up from about 10^0 to about 10^1. I am confused since the \hat{\beta} requires \sigma and \psi from previous iteration. If something wrong then all estimations should be incorrect... Another question is that I calculated the logf(Y;\theta) to see if it increases after updating \theta. Seems decreasing. I thought the X and Z are linearly dependent would cause some issue but I also changed the X and Z to some random numbers from normal distribution. I also tried ECM, which converges fast but sigma and psi are not close to the desired values. Is this due
[R] good documentation on use of Rscript. where to find?
What is the correct format for the shebang line and what options are allowed or necessary along with this? I find plenty of blogs and opinions, but few authoritative answers. I want an R script to run and update packages periodically, with a cron job that launches it. What is necessary to put in line one of the R program. Aside from the basic part #!/usr/bin/Rscript what else can there be, or should there be? Various people suggest adding things like --vanilla but those seem not to be accepted, at least on RedHat EL 6 with R-2.15.1. I'd like to run this with a command line like: $ R-labSelectInstall-02.R /tmp/Rupdate.txt 21 so that all the standard output and err goes to a text file, but I've not found a way to make it work comparably to the R command line option --vanilla or such. # ./R-labSelectInstall-02.R --vanilla 'ARNING: unknown option '--vanilla See what I mean? -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] inverse for formula transformations on LHS
This is an R formula handling question. It arose in class. We were working on the Animals data in the MASS package. In order to see a relationship, you need to log brain and body weight. It's a fun one for teaching regression, if you did not try it yet. There are outliers too! Students wanted to make a predicted value plot in the non-logged values of y, for comparison, and I wondered if I couldn't automate this somehow for them. It made me wonder how R manages formulae and if a transformation like log(y) can be be mechanically inverted. So we have something concrete to talk about, suppose x and y are variables in dat, a person fits m1 - lm(log(y) ~ log(x), data = dat) termplot shows log(y) on the vertical. What if I want y on the vertical? Similarly, predict gives values on the log(y) scale, there's no argument like type = untransformed. I want my solution to be a bit general, so that it would give back predicted y for formulae like sqrt(y) or exp(y) or log(y + d) or whatever other math people might throw in there. Here's what I can tell so far about R's insides. The formula handler makes a list out of the formula, I can get that from the terms object that the model generates. The formula list has ~ as element 1, and log(x) becomes element [[2]]. Where in the R source code can I see how R looks at the symbol log(y) and discerns that there is a variable y that needs to be logged? If I could understand that, and if R has a table of inverse functions, then maybe I could see what to do. If you have ideas, I'm very grateful if you share them. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] inverse for formula transformations on LHS
On Sat, May 18, 2013 at 7:05 AM, Stephen Milborrow mi...@sonic.net wrote: Paul Johnson pauljoh...@gmail.com wrote: m1 - lm(log(y) ~ log(x), data = dat) termplot shows log(y) on the vertical. What if I want y on the vertical? plotmo in the plotmo package has an inverse.func argument, so something like the following might work for you? Thanks, I will read plotmo. I did not hear of that one before. It looks like we have been working on same problem. Take at look at my package rockchalk and run the examples for plotSlopes and plotCurves. Exact same ideas you are thinking about. I don't think it will help in this case because it still relies on the user to know about exp as the inverse of log. When users do predict on glm, it just knows how to put predictions on the response or link scales, and I am aiming for something automagical like that. library(MASS) library(plotmo) log.brain - log(Animals$brain) log.body - log(Animals$body) m2 - lm(log.brain ~ log.body) myplot - function(...) plotmo(m2, do.par=F, nrug=-1, col.resp=2, pch=20,se=1, main=, xlab=log(body), ...) par(mfrow = c(2, 2)) myplot(ylab=log(brain)) myplot(ylab=brain, inverse.func=exp) termplot(m2, se=T, rug=T) # for comparison -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.