[R] hausman test
i am dealing with my panle data,and i have to decided if i should use fixed-effect model or random effect model.i know the hausman test can help. and anyone knows if any function can do these? thank you. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] locator() via tcltk
You have two asynchronous processes here: your R function will return, leaving a Tk widget up. Duncan Murdoch's solution is to give you a function that will retrieve at some future stage the result of the last press (if any) of Get coordinates button. Is that what you want? I think it is more likely you want to wait for the Tk interaction and then return the results, that is use a `modal' widget. If so, take a look at the examples in src/library/tcltk/R/utils.R which are modal and return their results. On Fri, 3 Jun 2005, Sebastian Luque wrote: Hello, I'm trying to write a function using tcltk to interactively modify a plot and gather locator() data. I've read Peter's articles in Rnews, the help pages in tcltk, http://bioinf.wehi.edu.au/~wettenhall/RTclTkExamples/, plus a post in R-help sometime ago, but haven't found a solution. The idea goes something like this: require(tcltk) testplot - function() { getcoords - function(...) { locator(5) } x - 1:1000 y - rnorm(1000) plot(x, y) base - tktoplevel() loc.pts - tkbutton(base, text = Get coordinates, command = getcoords) quit.but - tkbutton(base, text = Quit, command = function() tkdestroy(base)) tkpack(loc.pts, quit.but) } I'd like testplot to return the value from getcoords. The Get coordinates button seems to be correctly calling getcoords, and locator is doing its job, but I don't know how to store its value. Any help is very much appreciated. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] locator() via tcltk
To illustrate (?) Professor Ripley comments, I send you an example (stolen in various places...). Note the tkwm.resizable(tt, 0, 0) directive that prevents the window rescaling (if not,the coordinates will not be correct). # # Getting the mouse coords with TclTk # # Two possibilities: tkrplot package or Img library # # A) USING the tkrplot package # # 1a) Opening an X11 device named XImage (current device) X11(XImage,width=600, height=600) # 2a) Putting a contour... x - 10*(1:nrow(volcano)) y - 10*(1:ncol(volcano)) image(x, y, volcano, col = terrain.colors(100), axes = FALSE) contour(x, y, volcano, levels = seq(90, 200, by = 5), add = TRUE, col = peru) axis(1, at = seq(100, 800, by = 100)) axis(2, at = seq(100, 600, by = 100)) box() title(main = Maunga Whau Volcano, font.main = 4) # image dimensions (before XImage deletion) parPlotSize - par('plt') usrCoords - par('usr') # 3a) tkrplot loading (that load tcltk package) #The tcltk directive extended by tkrplot # image create Rplot truc #put the XImage image in the image Tcl objet named truc #and close XImage library(tkrplot) .Tcl(image create Rplot truc) # # B) USING the Tcl Img library (for .jpg and .png) # 1b) Opening a jpeg (or png) device jpeg(monimage.jpg,width=800, height=800) # 2b) = 2) + closing jpeg device x - 10*(1:nrow(volcano)) y - 10*(1:ncol(volcano)) image(x, y, volcano, col = terrain.colors(100), axes = FALSE) contour(x, y, volcano, levels = seq(90, 200, by = 5), add = TRUE, col = peru) axis(1, at = seq(100, 800, by = 100)) axis(2, at = seq(100, 600, by = 100)) box() title(main = Maunga Whau Volcano, font.main = 4) # image dimensions parPlotSize - par('plt') usrCoords - par('usr') dev.off() # 3b) Reading the monimage file in the Tcl objet named truc # and deleting the file library(tcltk) tclRequire('Img') # to read a .jpg file truc - tclVar() tkcmd(image,create,photo,truc,file=monimage.jpg) unlink(monimage.jpg) # # COMMON STEPS for both possibilities # 4) Opening the Tk window and putting the image inside tt - tktoplevel() tkpack(ll - tklabel(tt, image=truc), fill=both) tkwm.resizable(tt, 0, 0) # non resizable window # 5) Callback # Calculus of dimensions initial image/tk image width - as.numeric(tclvalue(tkwinfo(reqwidth,ll))) height - as.numeric(tclvalue(tkwinfo(reqheight,ll))) xMin - parPlotSize[1] * width xRange - (parPlotSize[2] - parPlotSize[1]) * width rangeX - (usrCoords[2] - usrCoords[1])/xRange yMin - parPlotSize[3] * height yRange - (parPlotSize[4] - parPlotSize[3]) * height rangeY - (usrCoords[4] - usrCoords[3])/yRange OnLeftClick = function(x,y) { xClick - as.numeric(x) yClick - height - as.numeric(y) xPlotCoord - usrCoords[1]+(xClick-xMin)*rangeX yPlotCoord - usrCoords[3]+(yClick-yMin)*rangeY ## AN ACTION ## print(c(xPlotCoord, yPlotCoord)) } tkbind(tt, Button-1, OnLeftClick) # Action tkbind(tt, Button-3, function() tclvalue(done) = 1) # Exit # 6) Loop waiting the events done = tclVar(0) tkwait.variable(done) tkdestroy(tt) ### -- Selon Prof Brian Ripley [EMAIL PROTECTED]: You have two asynchronous processes here: your R function will return, leaving a Tk widget up. Duncan Murdoch's solution is to give you a function that will retrieve at some future stage the result of the last press (if any) of Get coordinates button. Is that what you want? I think it is more likely you want to wait for the Tk interaction and then return the results, that is use a `modal' widget. If so, take a look at the examples in src/library/tcltk/R/utils.R which are modal and return their results. On Fri, 3 Jun 2005, Sebastian Luque wrote: Hello, I'm trying to write a function using tcltk to interactively modify a plot and gather locator() data. I've read Peter's articles in Rnews, the help pages in tcltk, http://bioinf.wehi.edu.au/~wettenhall/RTclTkExamples/, plus a post in R-help sometime ago, but haven't found a solution. The idea goes something like this: require(tcltk) testplot - function() { getcoords - function(...) { locator(5) } x - 1:1000 y - rnorm(1000) plot(x, y) base - tktoplevel() loc.pts - tkbutton(base, text = Get coordinates, command = getcoords) quit.but - tkbutton(base, text = Quit, command = function() tkdestroy(base)) tkpack(loc.pts, quit.but) } I'd like testplot to return the value from getcoords. The Get coordinates button seems to be correctly calling getcoords,
Re: [R] How to 'de-cross' a table?
Hi Thomas, Your code works perfectly! Thanks a lot, Sander. Thomas Lumley wrote: On Fri, 3 Jun 2005, Sander Oom wrote: Dear R users, I have received a table in the following format: id a b c1 c2 d1 d2 1 1 1 65 97 78 98 2 1 2 65 97 42 97 3 2 1 65 68 97 98 4 2 2 65 97 97 98 Factors of the design are: a, b, and e, where e has levels c and d. The levels c and d then have 2 replicates (r) each. Now I would like to get: id a b e r value 1 1 1 c 1 65 2 1 1 c 2 97 3 1 1 d 1 78 4 1 1 d 2 98 Any suggestions on how to tackle this? I'm not sure what the 'task' is called, so struggle to effectively search the web or R help. reshape() is the probably function. It seems you need to use it twice, for the two levels of uncrossing d1-reshape(d, direction=long, varying=list(c(c1,d1),c(c2,d2)), v.names=c(rep1,rep2),timevar=r, times=c(c,d)) d1 id a b r rep1 rep2 1.c 1 1 1 c 65 97 2.c 2 1 2 c 65 97 3.c 3 2 1 c 65 68 4.c 4 2 2 c 65 97 1.d 1 1 1 d 78 98 2.d 2 1 2 d 42 97 3.d 3 2 1 d 97 98 4.d 4 2 2 d 97 98 reshape(d1, direction=long, varying=list(c(rep1,rep2)), v.names=value,idvar=tmp, timevar=r) id a b r value tmp 1.1 1 1 1 165 1 2.1 2 1 2 165 2 3.1 3 2 1 165 3 4.1 4 2 2 165 4 5.1 1 1 1 178 5 6.1 2 1 2 142 6 7.1 3 2 1 197 7 8.1 4 2 2 197 8 1.2 1 1 1 297 1 2.2 2 1 2 297 2 3.2 3 2 1 268 3 4.2 4 2 2 297 4 5.2 1 1 1 298 5 6.2 2 1 2 297 6 7.2 3 2 1 298 7 8.2 4 2 2 298 8 You can then delete the `tmp` variable if you don't need it. An easier way to uncross would be with stack() stack(d[,4:7]) values ind 1 65 c1 2 65 c1 3 65 c1 4 65 c1 5 97 c2 6 97 c2 7 68 c2 8 97 c2 9 78 d1 10 42 d1 11 97 d1 12 97 d1 13 98 d2 14 97 d2 15 98 d2 16 98 d2 but you lose the other variables. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Dr Sander P. Oom Animal, Plant and Environmental Sciences, University of the Witwatersrand Private Bag 3, Wits 2050, South Africa Tel (work) +27 (0)11 717 64 04 Tel (home) +27 (0)18 297 44 51 Fax +27 (0)18 299 24 64 Email [EMAIL PROTECTED] Web www.oomvanlieshout.net/sander __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] can R do Fixed-effects (within) regression (panel data)?
i want to ask 2 questions. 1) can R do Random-effects GLS regression which i can get from Stata? the following result is frome Stata.can I get the alike result from R? xtreg lwage educ black hisp exper expersq married union, re Random-effects GLS regression Number of obs = 4360 Group variable (i) : nr Number of groups = 545 R-sq: within = 0.1799 Obs per group: min = 8 between = 0.1860avg = 8.0 overall = 0.1830max = 8 Random effects u_i ~ Gaussian Wald chi2(14) =957.77 corr(u_i, X) = 0 (assumed)Prob chi2=0. -- lwage | Coef. Std. Err. zP|z| [95% Conf. Interval] -+ educ | .0918763 .0106597 8.62 0.000 .0709836.1127689 .. d86 | .0919476 .0712293 1.29 0.197-.0476592.2315544 d87 | .1349289 .0813135 1.66 0.097-.0244427.2943005 _cons | .0235864 .1506683 0.16 0.876 -.271718.3188907 -+ sigma_u | .32460315 sigma_e | .35099001 rho | .46100216 (fraction of variance due to u_i) 2) can R do Fixed-effects (within) regression as Stata's xtreg? the followng example is from Introductory Econometrics: A Modern Approach by Jeffrey M. Wooldridge Chapter 14 - Advanced Panel Data Methods use http://fmwww.bc.edu/ec-p/data/wooldridge/JTRAIN iis fcode tis year xtreg lscrap d88 d89 grant grant_1, fe Fixed-effects (within) regression Number of obs = 162 Group variable (i) : fcode Number of groups =54 R-sq: within = 0.2010 Obs per group: min = 3 between = 0.0079avg = 3.0 overall = 0.0068max = 3 F(4,104) = 6.54 corr(u_i, Xb) = -0.0714Prob F =0.0001 -- lscrap | Coef. Std. Err. tP|t| [95% Conf. Interval] -+ d88 | -.0802157 .1094751-0.73 0.465-.2973089.1368776 d89 | -.2472028 .1332183-1.86 0.066-.5113797 .016974 grant | -.2523149.150629-1.68 0.097-.5510178 .046388 grant_1 | -.4215895 .2102-2.01 0.047-.8384239 -.0047551 _cons |.597434 .0677344 8.82 0.000 .4631142.7317539 -+ sigma_u | 1.438982 sigma_e | .4977442 rho | .89313867 (fraction of variance due to u_i) -- F test that all u_i=0: F(53, 104) =24.66 Prob F = 0. thank you!! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] barplot and missing values?
I want to include missing values in my barplot to get the correct x-axis, for example, x - c(1,2,3,4, 9) y - c(2,4,6,8,18) barplot(y) The above looks wrong because the last height in y should be a long way over. So I want to do something like... x - c(1,2,3,4,5,6,7,8, 9) y - c(2,4,6,8,0,0,0,0,18) barplot(y) However... I am actually using barplot2 to use the log='y' function, so I can't use zero values on a log scale... So I need... x - c(1,2,3,4, 5, 6, 7, 8, 9) y - c(2,4,6,8,NA,NA,NA,NA,18) barplot(y) But that don't work. Am I missing something? To avoid confusion here is my data... dat.y.plot [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] Ho 653 80 132 10 343 1007 2 7 7 He 139 56 696 243 1132 1 2 6 attr(,names) [1] 2 3 4 5 6 7 8 9 10 11 12 12 [13] NANANANANANANANANANANANA And here is what I call... barplot(dat.y.plot, ylim=c(0,max(dat.y.plot + 50)), # I don't like the default beside=T, names.arg=c('2','3','4','5','6','7','8','9','10','11','12','12'), cex.axis=1.5, cex.names=1.5, legend=T ) Which is fine (except I don't know why I still need names.arg). Then I try... library(gregmisc) barplot2(dat.y.plot+1, log='y', beside=T, names.arg=c('2','3','4','5','6','7','8','9','10','11','12','12'), cex.axis=1.5, cex.names=1.5, legend=T ) Which fails because of the zero. If I try ... dat.y.plot[dat.y.plot==0] - NA It fails because of the NA. Any suggestions? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] barplot and missing values?
Dan Bolser wrote: I want to include missing values in my barplot to get the correct x-axis, for example, x - c(1,2,3,4, 9) y - c(2,4,6,8,18) barplot(y) The above looks wrong because the last height in y should be a long way over. So I want to do something like... x - c(1,2,3,4,5,6,7,8, 9) y - c(2,4,6,8,0,0,0,0,18) barplot(y) However... I am actually using barplot2 to use the log='y' function, so I can't use zero values on a log scale... So I need... x - c(1,2,3,4, 5, 6, 7, 8, 9) y - c(2,4,6,8,NA,NA,NA,NA,18) barplot(y) But that don't work. Actually, it works, at least for me (R-2.1.0, WinNT, but you have not told us your details!). BTW: In the meantime package gregmisc has been superseded by the gregmisc bundle, and later on by a number of packages (such as gtools, gdata, ...). Your setup seems to be rather outdated. Uwe Ligges Am I missing something? To avoid confusion here is my data... dat.y.plot [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] Ho 653 80 132 10 343 1007 2 7 7 He 139 56 696 243 1132 1 2 6 attr(,names) [1] 2 3 4 5 6 7 8 9 10 11 12 12 [13] NANANANANANANANANANANANA And here is what I call... barplot(dat.y.plot, ylim=c(0,max(dat.y.plot + 50)), # I don't like the default beside=T, names.arg=c('2','3','4','5','6','7','8','9','10','11','12','12'), cex.axis=1.5, cex.names=1.5, legend=T ) Which is fine (except I don't know why I still need names.arg). Then I try... library(gregmisc) barplot2(dat.y.plot+1, log='y', beside=T, names.arg=c('2','3','4','5','6','7','8','9','10','11','12','12'), cex.axis=1.5, cex.names=1.5, legend=T ) Which fails because of the zero. If I try ... dat.y.plot[dat.y.plot==0] - NA It fails because of the NA. Any suggestions? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] barplot and missing values?
On Sat, 4 Jun 2005, Uwe Ligges wrote: Dan Bolser wrote: I want to include missing values in my barplot to get the correct x-axis, for example, x - c(1,2,3,4, 9) y - c(2,4,6,8,18) barplot(y) The above looks wrong because the last height in y should be a long way over. So I want to do something like... x - c(1,2,3,4,5,6,7,8, 9) y - c(2,4,6,8,0,0,0,0,18) barplot(y) However... I am actually using barplot2 to use the log='y' function, so I can't use zero values on a log scale... So I need... x - c(1,2,3,4, 5, 6, 7, 8, 9) y - c(2,4,6,8,NA,NA,NA,NA,18) barplot(y) But that don't work. Actually, it works, at least for me (R-2.1.0, WinNT, but you have not told us your details!). BTW: In the meantime package gregmisc has been superseded by the gregmisc bundle, and later on by a number of packages (such as gtools, gdata, ...). Your setup seems to be rather outdated. yeah :( R 2.0.0 (2004-10-04). I will upgrade to 2.1.0 (latest stable?) Instead of gregmisc what should I use to get barplot2? Will barplot() ever become barplot2() I will try upgrading Dan. Uwe Ligges Am I missing something? To avoid confusion here is my data... dat.y.plot [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] Ho 653 80 132 10 343 1007 2 7 7 He 139 56 696 243 1132 1 2 6 attr(,names) [1] 2 3 4 5 6 7 8 9 10 11 12 12 [13] NANANANANANANANANANANANA And here is what I call... barplot(dat.y.plot, ylim=c(0,max(dat.y.plot + 50)), # I don't like the default beside=T, names.arg=c('2','3','4','5','6','7','8','9','10','11','12','12'), cex.axis=1.5, cex.names=1.5, legend=T ) Which is fine (except I don't know why I still need names.arg). Then I try... library(gregmisc) barplot2(dat.y.plot+1, log='y', beside=T, names.arg=c('2','3','4','5','6','7','8','9','10','11','12','12'), cex.axis=1.5, cex.names=1.5, legend=T ) Which fails because of the zero. If I try ... dat.y.plot[dat.y.plot==0] - NA It fails because of the NA. Any suggestions? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] barplot and missing values?
Dan Bolser wrote: On Sat, 4 Jun 2005, Uwe Ligges wrote: Dan Bolser wrote: I want to include missing values in my barplot to get the correct x-axis, for example, x - c(1,2,3,4, 9) y - c(2,4,6,8,18) barplot(y) The above looks wrong because the last height in y should be a long way over. So I want to do something like... x - c(1,2,3,4,5,6,7,8, 9) y - c(2,4,6,8,0,0,0,0,18) barplot(y) However... I am actually using barplot2 to use the log='y' function, so I can't use zero values on a log scale... So I need... x - c(1,2,3,4, 5, 6, 7, 8, 9) y - c(2,4,6,8,NA,NA,NA,NA,18) barplot(y) But that don't work. Actually, it works, at least for me (R-2.1.0, WinNT, but you have not told us your details!). BTW: In the meantime package gregmisc has been superseded by the gregmisc bundle, and later on by a number of packages (such as gtools, gdata, ...). Your setup seems to be rather outdated. yeah :( R 2.0.0 (2004-10-04). Hmm. I just tested with R-1.9.1, and your last example even works with that one... I will upgrade to 2.1.0 (latest stable?) Instead of gregmisc what should I use to get barplot2? package gplots You example y is also handled perfectly well by barplot2() on my system, BTW. Uwe Ligges Will barplot() ever become barplot2() I will try upgrading Dan. Uwe Ligges Am I missing something? To avoid confusion here is my data... dat.y.plot [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] Ho 653 80 132 10 343 1007 2 7 7 He 139 56 696 243 1132 1 2 6 attr(,names) [1] 2 3 4 5 6 7 8 9 10 11 12 12 [13] NANANANANANANANANANANANA And here is what I call... barplot(dat.y.plot, ylim=c(0,max(dat.y.plot + 50)), # I don't like the default beside=T, names.arg=c('2','3','4','5','6','7','8','9','10','11','12','12'), cex.axis=1.5, cex.names=1.5, legend=T ) Which is fine (except I don't know why I still need names.arg). Then I try... library(gregmisc) barplot2(dat.y.plot+1, log='y', beside=T, names.arg=c('2','3','4','5','6','7','8','9','10','11','12','12'), cex.axis=1.5, cex.names=1.5, legend=T ) Which fails because of the zero. If I try ... dat.y.plot[dat.y.plot==0] - NA It fails because of the NA. Any suggestions? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] barplot and missing values?
On Sat, 4 Jun 2005, Uwe Ligges wrote: Dan Bolser wrote: On Sat, 4 Jun 2005, Uwe Ligges wrote: Dan Bolser wrote: I want to include missing values in my barplot to get the correct x-axis, for example, x - c(1,2,3,4, 9) y - c(2,4,6,8,18) barplot(y) The above looks wrong because the last height in y should be a long way over. So I want to do something like... x - c(1,2,3,4,5,6,7,8, 9) y - c(2,4,6,8,0,0,0,0,18) barplot(y) However... I am actually using barplot2 to use the log='y' function, so I can't use zero values on a log scale... So I need... x - c(1,2,3,4, 5, 6, 7, 8, 9) y - c(2,4,6,8,NA,NA,NA,NA,18) barplot(y) But that don't work. Actually, it works, at least for me (R-2.1.0, WinNT, but you have not told us your details!). BTW: In the meantime package gregmisc has been superseded by the gregmisc bundle, and later on by a number of packages (such as gtools, gdata, ...). Your setup seems to be rather outdated. yeah :( R 2.0.0 (2004-10-04). Hmm. I just tested with R-1.9.1, and your last example even works with that one... I will upgrade to 2.1.0 (latest stable?) Instead of gregmisc what should I use to get barplot2? package gplots You example y is also handled perfectly well by barplot2() on my system, BTW. This must be because of the log='y' option that I am using here. y - c(2,4,6,8,NA,NA,NA,NA,18) barplot2(y,log='y') Above fails. I appreciate that what I am trying to do is somewhat artificial (handle zero values on a log scale), but it does reflect the data I have. I tried plot(..., type='h'), but that dosn't do the beside=T stuff that I want to do. I am now trying things like... barplot2( dat.y.plot + 0.11, # Dirty hack offset=-0.1, # xpd=F, # log='y', beside=T ) Which looks messy. Any way to cleanly handle NA values with barplot2 on a log scale (log='y')? Thanks for your help :) Uwe Ligges Will barplot() ever become barplot2() I will try upgrading Dan. Uwe Ligges Am I missing something? To avoid confusion here is my data... dat.y.plot [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] Ho 653 80 132 10 343 1007 2 7 7 He 139 56 696 243 1132 1 2 6 attr(,names) [1] 2 3 4 5 6 7 8 9 10 11 12 12 [13] NANANANANANANANANANANANA And here is what I call... barplot(dat.y.plot, ylim=c(0,max(dat.y.plot + 50)), # I don't like the default beside=T, names.arg=c('2','3','4','5','6','7','8','9','10','11','12','12'), cex.axis=1.5, cex.names=1.5, legend=T ) Which is fine (except I don't know why I still need names.arg). Then I try... library(gregmisc) barplot2(dat.y.plot+1, log='y', beside=T, names.arg=c('2','3','4','5','6','7','8','9','10','11','12','12'), cex.axis=1.5, cex.names=1.5, legend=T ) Which fails because of the zero. If I try ... dat.y.plot[dat.y.plot==0] - NA It fails because of the NA. Any suggestions? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] barplot and missing values?
Dan Bolser wrote: [all previous stuff deleted] I see, what comes out of this longish thread is: - barplot() and barplot2() both have deficiencies for you particular examples, so it is time to provide patches for both barplot() and barplot2() (for the latter, you might want to contact the package maintainer as well) ... - Please provide *reproducible* examples (yours was not, because log='y' was missing). Hence the relevant example we were obviously talking about is: y - c(2,4,6,8,NA,NA,NA,NA,18) barplot(y, log=y) library(gplots) barplot2(y, log=y) Uwe Ligges __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] barplot and missing values?
On Sat, 2005-06-04 at 14:50 +0100, Dan Bolser wrote: snip This must be because of the log='y' option that I am using here. y - c(2,4,6,8,NA,NA,NA,NA,18) barplot2(y,log='y') Above fails. I appreciate that what I am trying to do is somewhat artificial (handle zero values on a log scale), but it does reflect the data I have. I tried plot(..., type='h'), but that dosn't do the beside=T stuff that I want to do. I am now trying things like... barplot2( dat.y.plot + 0.11, # Dirty hack offset=-0.1, # xpd=F, # log='y', beside=T ) Which looks messy. Any way to cleanly handle NA values with barplot2 on a log scale (log='y')? snip Dan, You are actually close in the above example, using the 'offset' argument. In this case, you still cannot use NAs, since their value is unknown and so must set these elements to zero. Then using a small offset value, you can adjust the base value of the y axis so that it is just above zero. This should result in a minimal shift of the bar values above their actual values and should not materially affect the plot's representation of the data. Something like the following should work: y - c(2, 4, 6, 8, NA, NA, NA, NA, 18) y [1] 2 4 6 8 NA NA NA NA 18 y[is.na(y)] - 0 y [1] 2 4 6 8 0 0 0 0 18 barplot2(y, log = y, offset = 0.01, las = 2) Note also that if you follow the above with: box() The residual bars from the (0 + 0.01) values are covered with the plot region box, if that is an issue for you. This is still something of a hack, but it is a little cleaner. The key of course is to avoid the use of a bar value of log(x), where x = 0. Selecting the proper offset value based upon your actual data is important so as to minimally affect the values visually. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] can R do Fixed-effects (within) regression (panel data)?
On 6/4/05, ronggui [EMAIL PROTECTED] wrote: i want to ask 2 questions. 1) can R do Random-effects GLS regression which i can get from Stata? the following result is frome Stata.can I get the alike result from R? xtreg lwage educ black hisp exper expersq married union, re Random-effects GLS regression Number of obs = 4360 Group variable (i) : nr Number of groups = 545 R-sq: within = 0.1799 Obs per group: min = 8 between = 0.1860avg = 8.0 overall = 0.1830max = 8 Random effects u_i ~ Gaussian Wald chi2(14) =957.77 corr(u_i, X) = 0 (assumed)Prob chi2=0. -- lwage | Coef. Std. Err. zP|z| [95% Conf. Interval] -+ educ | .0918763 .0106597 8.62 0.000 .0709836.1127689 .. d86 | .0919476 .0712293 1.29 0.197-.0476592.2315544 d87 | .1349289 .0813135 1.66 0.097-.0244427.2943005 _cons | .0235864 .1506683 0.16 0.876 -.271718.3188907 -+ sigma_u | .32460315 sigma_e | .35099001 rho | .46100216 (fraction of variance due to u_i) 2) can R do Fixed-effects (within) regression as Stata's xtreg? the followng example is from Introductory Econometrics: A Modern Approach by Jeffrey M. Wooldridge Chapter 14 - Advanced Panel Data Methods use http://fmwww.bc.edu/ec-p/data/wooldridge/JTRAIN iis fcode tis year xtreg lscrap d88 d89 grant grant_1, fe Fixed-effects (within) regression Number of obs = 162 Group variable (i) : fcode Number of groups =54 R-sq: within = 0.2010 Obs per group: min = 3 between = 0.0079avg = 3.0 overall = 0.0068max = 3 F(4,104) = 6.54 corr(u_i, Xb) = -0.0714Prob F =0.0001 -- lscrap | Coef. Std. Err. tP|t| [95% Conf. Interval] -+ d88 | -.0802157 .1094751-0.73 0.465-.2973089.1368776 d89 | -.2472028 .1332183-1.86 0.066-.5113797 .016974 grant | -.2523149.150629-1.68 0.097-.5510178 .046388 grant_1 | -.4215895 .2102-2.01 0.047-.8384239 -.0047551 _cons |.597434 .0677344 8.82 0.000 .4631142.7317539 -+ sigma_u | 1.438982 sigma_e | .4977442 rho | .89313867 (fraction of variance due to u_i) -- F test that all u_i=0: F(53, 104) =24.66 Prob F = 0. I'm not sure what the models being fit by Stata are but I imagine that they correspond to models that can be fit in R by lmer (package lme4) or lme (package nlme). __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] barplot and missing values?
On Sat, 4 Jun 2005, Marc Schwartz wrote: On Sat, 2005-06-04 at 14:50 +0100, Dan Bolser wrote: snip This must be because of the log='y' option that I am using here. y - c(2,4,6,8,NA,NA,NA,NA,18) barplot2(y,log='y') Above fails. I appreciate that what I am trying to do is somewhat artificial (handle zero values on a log scale), but it does reflect the data I have. I tried plot(..., type='h'), but that dosn't do the beside=T stuff that I want to do. I am now trying things like... barplot2( dat.y.plot + 0.11, # Dirty hack offset=-0.1, # xpd=F, # log='y', beside=T ) Which looks messy. Any way to cleanly handle NA values with barplot2 on a log scale (log='y')? snip Dan, You are actually close in the above example, using the 'offset' argument. In this case, you still cannot use NAs, since their value is unknown and so must set these elements to zero. Then using a small offset value, you can adjust the base value of the y axis so that it is just above zero. This should result in a minimal shift of the bar values above their actual values and should not materially affect the plot's representation of the data. Something like the following should work: y - c(2, 4, 6, 8, NA, NA, NA, NA, 18) y [1] 2 4 6 8 NA NA NA NA 18 y[is.na(y)] - 0 y [1] 2 4 6 8 0 0 0 0 18 barplot2(y, log = y, offset = 0.01, las = 2) Note also that if you follow the above with: box() The residual bars from the (0 + 0.01) values are covered with the plot region box, if that is an issue for you. Actually it looks a bit strange (I guess you didn't check it?) - I see what is happening. It isn't much different from... barplot2(y+0.01, log = y,las = 1) Which is the essence of the fix, but all that bar (on a log scale) between 1 and 0.1 and 0.01 is as big as 1 to 10, which is a bit artificial. My previous fix looks best now I check it with the example ... y y [1] 2 4 6 8 0 0 0 0 18 barplot2( y + 0.11, ylim=c(1,max(y)), offset = -0.10, log='y', xpd=F ) box() Looks like the above is what I need :) Thanks for teh help - its reasuring to see similar fixes :) This is still something of a hack, but it is a little cleaner. The key of course is to avoid the use of a bar value of log(x), where x = 0. Selecting the proper offset value based upon your actual data is important so as to minimally affect the values visually. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to change the value of a class slot
Ross == Ross Boylan [EMAIL PROTECTED] on Fri, 03 Jun 2005 17:04:08 -0700 writes: Ross I defined an S4 class with a slot i. Then I wrote a regular function Ross that attempted to increment i. Ross This didn't work, apparently because of the general rule that a function Ross can't change the values of its arguments outside the function. I gather Ross there are ways around it, but the Green book admonishes cheating on the Ross S evaluation model is to be avoided (p. 190). Ross Thinking that class methods needed to an exception to this rule, I then Ross tried setMethod with the function I had written. However, when I called Ross the function I got setMethod(nextPath, CompletePathMaker, nextPath) Ross Creating a new generic function for 'nextPath' in '.GlobalEnv' Ross [1] nextPath nextPath(pm) Ross Error: protect(): protection stack overflow Ross I can change the value of the slot interactively, so the problem does Ross not appear to be that the slots are considered off-limits. Ross What do I need to do to update slot values? Ross Here are some possibly relevant code fragments Ross setClass(CompletePathMaker, Ross representation(i=integer, Ross timeOffset=numeric, # to avoid 0's Ross truePaths=TruePaths) Ross ) Ross nextPath - function(pm){ #pm is a CompletePathMaker Ross[EMAIL PROTECTED] - [EMAIL PROTECTED](1) Ross [etc] If your nextPath function has 'pm' as its last statement it will return the updated object, and if you call it as mypm - nextPath(mypm) you are 1) updating mypm 2) in a proper S way (i.e. no cheating). Regards, Martin Ross I'm trying to make the class behave like an iterator, with i keeping Ross track of its location. I'm sure there are more R'ish ways to go, but Ross I'm also pretty sure I'm going to want to be able to update slots. Ross Thanks. Ross Ross Boylan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Storing data frame in a RDBMS
On 6/4/05, Adam Witney [EMAIL PROTECTED] wrote: Hi, Is there anyway to store a data frame in a database, and by this I mean the binary itself, not the contents? I am using PL/R in PostgreSQL amd have written some functions to build my data frame. However this can take some time with some large datasets and I would like to not have to repeat the process and so I would like to save the data frame. Rather than save/load into the file system I would like to be able to save the entire data frame as a single object in the database Is this possible? Thanks for any help Check out ?serialize __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] ts.intersect a multivariate and univariate ts
Adam: Providing a reproducible example would be a first step... That's the problem, I can't. But I str has come to the rescue: R str(rw) Time-Series [1:307] from 1690 to 1996: 0.986 1.347 1.502 1.594 1.475 ... R str(pg) List of 264 $ : num 0.227 $ : num 0.189 $ : num 0.237 $ : num 0.235 . . . . - attr(*, dim)= int [1:2] 22 12 - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:12] Series 1 Series 2 Series 3 Series 4 ... - attr(*, tsp)= num [1:3] 1982 20031 - attr(*, class)= chr [1:2] mts ts Why is pg a list? pg was created by taking a row out of a much larger data.frame: R pg - ts(matrix(monthly.pg[1,-c(1:4)], ncol = 12, byrow = T), start = 1982) R class(pg) [1] mts ts R mode(pg) [1] list So, changing the mode to numeric, allowed me to intersect the ts: R pg - ts(matrix(as.numeric(monthly.pg[1,-c(1:4)]), ncol = 12, byrow = T), start = 1982) R tsp(ts.intersect(rw, pg)) [1] 1982 19961 Weird. I suppose keeping on top of every object's mode is important. Thanks for the push forward. -Andy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ts.intersect a multivariate and univariate ts
Andy Bunn abunn at whrc.org writes: : : Adam: : Providing a reproducible example would be a first step... : : That's the problem, I can't. But I str has come to the rescue: We can provide it in a reproducible way like this: dput(rw) dput(pg) That will output both in a format that anyone else can paste back into R from your email. : : R str(rw) : Time-Series [1:307] from 1690 to 1996: 0.986 1.347 1.502 1.594 1.475 ... : R str(pg) : List of 264 : $ : num 0.227 : $ : num 0.189 : $ : num 0.237 : $ : num 0.235 : : . : . : . : . : - attr(*, dim)= int [1:2] 22 12 : - attr(*, dimnames)=List of 2 : ..$ : NULL : ..$ : chr [1:12] Series 1 Series 2 Series 3 Series 4 ... : - attr(*, tsp)= num [1:3] 1982 20031 : - attr(*, class)= chr [1:2] mts ts : : Why is pg a list? pg was created by taking a row out of a much larger : data.frame: A data frame is a list with one component per column so taking a matrix of a data frame gives something like this: R matrix(iris) [,1] [1,] Numeric,150 [2,] Numeric,150 [3,] Numeric,150 [4,] Numeric,150 [5,] factor,150 which is a matrix each of whose elements is a column of the original data frame. What you really want here is 'as.matrix' or 'data.matrix', not 'matrix'. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] glm with a distribution free family
Dear R users, I am trying to fit a glm with a distribution free family, link = log and variance = constant*mu. I guess I have to use the quasi family but the choices of variance are restricted to constant or mu or mu^2..., I don't know the way to choose the variance that I need, i.e. constant*mu. If you have any ideas or advice, please tell me. Thanks, Laetitia Mestdagh Laetitia Mestdagh élève en 3ème année a l'ISUP( Université de Paris VI) 13 rue Ernest Chamblain 91250 St Germain les Corbeil Tel : 01 69 89 28 15 Portable : 06 18 44 53 26 - ils, photos et vidéos ! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Storing data frame in a RDBMS
Gabor Grothendieck wrote: On 6/4/05, Adam Witney [EMAIL PROTECTED] wrote: I am using PL/R in PostgreSQL amd have written some functions to build my data frame. However this can take some time with some large datasets and I would like to not have to repeat the process and so I would like to save the data frame. Rather than save/load into the file system I would like to be able to save the entire data frame as a single object in the database Is this possible? Check out ?serialize Looks like serialize should work nicely: create or replace function test_serialize(text) returns text as ' mydf - pg.spi.exec(arg1) return (serialize(mydf, NULL, ascii = TRUE)) ' language 'plr'; create table saved_df (id int, df text); insert into saved_df select 1, f from test_serialize('select oid, typname from pg_type where typname = ''oid'' or typname = ''text''') as t(f); create or replace function restore_df(text) returns setof record as ' unserialize(arg1) ' language 'plr'; select * from restore_df((select df from saved_df where id =1)) as t(oid oid, typname name); oid | typname -+- 25 | text 26 | oid (2 rows) HTH, Joe __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Storing data frame in a RDBMS
On Sat, 2005-06-04 at 13:18 -0700, Joe Conway wrote: Gabor Grothendieck wrote: On 6/4/05, Adam Witney [EMAIL PROTECTED] wrote: I am using PL/R in PostgreSQL amd have written some functions to build my data frame. However this can take some time with some large datasets and I would like to not have to repeat the process and so I would like to save the data frame. Rather than save/load into the file system I would like to be able to save the entire data frame as a single object in the database Is this possible? Check out ?serialize Looks like serialize should work nicely: create or replace function test_serialize(text) returns text as ' mydf - pg.spi.exec(arg1) return (serialize(mydf, NULL, ascii = TRUE)) ' language 'plr'; I was trying to do something similar but I needed to this from R itself. That is, I'd like to save a data.frame or a lm object to a DB. I looked at serialize but as it just takes a connection object I'm not sure as to how I would specify say a table in a given DB. I'm using R 2.0.1, PostgreSQL and Rdbi and Rdbi.PostgreSQL. I've looked at the docs and I see that there are helper functions to write a table containing text, numeric or boolean. But its not clear to me how I would write an arbitrary object to a DB. Any pointers would be appreciated. Thanks, --- Rajarshi Guha [EMAIL PROTECTED] http://jijo.cjb.net GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE --- Eureka! -- Archimedes __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Storing data frame in a RDBMS
On 6/4/05, Rajarshi Guha [EMAIL PROTECTED] wrote: On Sat, 2005-06-04 at 13:18 -0700, Joe Conway wrote: Gabor Grothendieck wrote: On 6/4/05, Adam Witney [EMAIL PROTECTED] wrote: I am using PL/R in PostgreSQL amd have written some functions to build my data frame. However this can take some time with some large datasets and I would like to not have to repeat the process and so I would like to save the data frame. Rather than save/load into the file system I would like to be able to save the entire data frame as a single object in the database Is this possible? Check out ?serialize Looks like serialize should work nicely: create or replace function test_serialize(text) returns text as ' mydf - pg.spi.exec(arg1) return (serialize(mydf, NULL, ascii = TRUE)) ' language 'plr'; I was trying to do something similar but I needed to this from R itself. That is, I'd like to save a data.frame or a lm object to a DB. I looked at serialize but as it just takes a connection object I'm not sure as to how I would specify say a table in a given DB. I'm using R 2.0.1, PostgreSQL and Rdbi and Rdbi.PostgreSQL. I've looked at the docs and I see that there are helper functions to write a table containing text, numeric or boolean. But its not clear to me how I would write an arbitrary object to a DB. Look at R example(serialize) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] glm with a distribution free family
Have you considered computing and plotting sample averages and variances for different subgroups of the data then plotting variances vs. averages? This was suggested by Fahrmeir and Tutz (2001) Multivariate Statistical Modeling Based on Generalized Linear Models, 2nd ed. (Springer, Example 2.7, p. 59-60). This analysis led them to consider variance(mu) = phi*mu, phi*mu^2 and mu+theta*mu^2 for a particular example. The variance function mu+theta*mu^2 corresponds to the negative binomial distribution, as noted by McCullagh and Nelder (1989) Generalized Linear Models, 2nd 3d. (Chapman Hall, Table 9.1, p. 326). A family function negative.binomial is included in library(MASS) and discussed in Venables and Ripley (2002) Modern Applied Statistics with S, 4th ed. (Springer, sec. 7.4, pp. 206-208). R script to perform this and other analyses in the first 60 pages of Fahrmeir and Tutz (2001) is attached, utilizing library(Fahrmeir) supported by Kjetil Halvorsen, who helped me with the attached. The r-help server will probably remove the attachment, but it should reach you, Leatitia, and Kjetil. Kjetil suggested he might include something like this is an update to library(Fahrmeir), perhaps in a demo subdirectory. Therefore, others interested in this might check there first and ping me and Kjetil if they don't find it on CRAN. I wish to thank Kjetil for his help with similar questions earlier -- and for maintaining the Farhmeir package on CRAN. I could not have answered this question 24 hours ago. Best Wishes, spencer graves Laetitia Mestdagh wrote: Dear R users, I am trying to fit a glm with a distribution free family, link = log and variance = constant*mu. I guess I have to use the quasi family but the choices of variance are restricted to constant or mu or mu^2..., I don't know the way to choose the variance that I need, i.e. constant*mu. If you have any ideas or advice, please tell me. Thanks, Laetitia Mestdagh Laetitia Mestdagh élève en 3ème année a l'ISUP( Université de Paris VI) 13 rue Ernest Chamblain 91250 St Germain les Corbeil Tel : 01 69 89 28 15 Portable : 06 18 44 53 26 - ils, photos et vidéos ! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] data transformation
i have data fame da: da x y 1 1 a 2 2 a 3 3 a 4 4 a 5 5 a 6 6 b 7 7 b 8 8 b 9 9 b 10 10 b str(da) `data.frame': 10 obs. of 2 variables: $ x: num 1 2 3 4 5 6 7 8 9 10 $ y: Factor w/ 2 levels a,b: 1 1 1 1 1 2 2 2 2 2 and i want to generate new variable da$z,when y==a,da$z=da$x-mean(x[y==a]) ,when y==b,da$z=da$x-mean(x[y==b]). this data frame is simple and i can do it by hand,if the y has many levels and i have x1,x2 can i do it quickly? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] data transformation
Try: da$z - da$x - ave(da$x, da$y) da x y z 1 1 a -2 2 2 a -1 3 3 a 0 4 4 a 1 5 5 a 2 6 6 b -2 7 7 b -1 8 8 b 0 9 9 b 1 10 10 b 2 Andy From: ronggui i have data fame da: da x y 1 1 a 2 2 a 3 3 a 4 4 a 5 5 a 6 6 b 7 7 b 8 8 b 9 9 b 10 10 b str(da) `data.frame': 10 obs. of 2 variables: $ x: num 1 2 3 4 5 6 7 8 9 10 $ y: Factor w/ 2 levels a,b: 1 1 1 1 1 2 2 2 2 2 and i want to generate new variable da$z,when y==a,da$z=da$x-mean(x[y==a]) ,when y==b,da$z=da$x-mean(x[y==b]). this data frame is simple and i can do it by hand,if the y has many levels and i have x1,x2 can i do it quickly? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] data transformation
On 6/4/05, ronggui [EMAIL PROTECTED] wrote: i have data fame da: da x y 1 1 a 2 2 a 3 3 a 4 4 a 5 5 a 6 6 b 7 7 b 8 8 b 9 9 b 10 10 b str(da) `data.frame': 10 obs. of 2 variables: $ x: num 1 2 3 4 5 6 7 8 9 10 $ y: Factor w/ 2 levels a,b: 1 1 1 1 1 2 2 2 2 2 and i want to generate new variable da$z,when y==a,da$z=da$x-mean(x[y==a]) ,when y==b,da$z=da$x-mean(x[y==b]). this data frame is simple and i can do it by hand,if the y has many levels and i have x1,x2 can i do it quickly? Try this: da$z - resid(lm( x ~ y, da )) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html