Re: [R] R2WinBUGS error
Joseph Retzer wrote: Dear R-help, I'm using the R2WinBUGS package and getting an error message: Error in file(file, r) : unable to open connection In addition: Warning message: cannot open file 'codaIndex.txt', reason 'No such file or directory' Looks like there was an error in WinBUGS: R cannot access the file, because WinBUGS has not written it before. You can use bugs(.., debug = TRUE) Then WinBUGS stays open and you can look at its error messages in order to correct the code/data/inits or whatever Uwe Ligges I'm using R 2.2.1 and WinBUGS 1.4.1 on a windows machine (XP). My R code and WinBUGS code is given below. The complete WinBUGS program executes correctly in WinBUGS however I'm generating some of my inits using WinBUGS so I may be making an error there. On the other hand, the error generated in R seems to imply it cannot locate a file. I've checked my paths and they are correct. Also, my data is loading correctly. Many thanks, Joe # # R code # Runs Bayesian Ordered Logit by calling WinBUGS from R # ologit2.txt: WinBUGS commands library(R2WinBUGS) setwd(c:/docume~1/admini~1/mydocu~1/r_tuto~1) load(oldat.Rdata) # R data file containing data frame ol.dat # with vars: q02, bf23f, bf23b, bf22, bf34a, bf34.1, bf34.2 q02 -ol.dat$q02 bf23f - ol.dat$bf23f bf23b - ol.dat$bf23b bf22 - ol.dat$bf22 bf34a - ol.dat$bf34a bf34.1 - ol.dat$bf34.1 bf34.2 - ol.dat$bf34.2 N=nrow(ol.dat) Ncut=5 data - list(N, q02, bf23f, bf23b, bf22, bf34a, bf34.1, bf34.2, Ncut) inits - function() { list(k=c(-5, -4, -3, -2, -1), tau=2, theta=rnorm(7, -1, 100)) } parameters - c(k) olog.out - bugs(data, inits, parameters, model.file=c:/Documents and Settings/Administrator/My Documents/r_tutorial/ologit2.txt, n.chains = 2, n.iter = 1000, bugs.directory = c:/Program Files/WinBUGS14/) # WinBUGS code model exec; { # Priors on regression coefficients theta[1] ~ dnorm( -1,1.0) ; theta[2] ~ dnorm(-1,1.0) theta[3] ~ dnorm( 1,1.0) ; theta[4] ~ dnorm(-1,1.0) theta[5] ~ dnorm( -1,1.0) ; theta[6] ~ dnorm( 1,1.0) theta[7] ~ dnorm( -1,1.0) # Priors on latent variable cutpoints k[1] ~ dnorm(0, 0.1)I(, k[2]); k[2] ~ dnorm(0, 0.1)I(k[1], k[3]) k[3] ~ dnorm(0, 0.1)I(k[2], k[4]); k[4] ~ dnorm(0, 0.1)I(k[3], k[5]) k[5] ~ dnorm(0, 0.1)I(k[4], ) # Prior on precision tau ~ dgamma(0.001, 0.001) # Some defs sigma - sqrt(1 / tau); log.sigma - log(sigma); for (i in 1 : N) { # Prior on b[i] ~ dnorm(0.0, tau) # Model Mean mu[i] - theta[1] + theta[2]*bf22[i] + theta[3]*bf23b[i] + theta[4]*bf23f[i] + theta[5]*bf34a[i] + theta[6]*bf34.1[i] + theta[7]*bf34.2[i] for (j in 1 : Ncut) { # Logit Model # cumulative probability of lower response than j logit(Q[i, j]) - -(k[j] + mu[i] + b[i]) } # probability that response = j p[i,1] - max( min(1 - Q[i,1], 1), 0) for (j in 2 : Ncut) { p[i,j] - max( min(Q[i,j-1] - Q[i,j],1), 0) } p[i,(Ncut+1)] - max( min(Q[i,Ncut], 1), 0) q02[i] ~ dcat(p[i, ]) } } [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] rounding of voronoi vertices using deldir()
The problem is most likely your use of cat() for output. Consider x - 8665540.49905558 cat(x, \n) 8665540 cat(as.character(x), \n) 8665540.49905558 options(digits=10) cat(x, \n) 8665540.499 So it would be best to do the conversions yourself, and I would investigate using format() to do so. But just changing options(digits=) may suffice. The 'digits' option in deldir rounds the output, and with the size of numbers you have 10dp is at or below representation accuracy. There are alternative R packages for Voronoi polygons, e.g. tripack. On Wed, 5 Apr 2006, Mike Leahy wrote: Hello list, I'm just getting started with using R - I have been trying over the past day or so to work out a method for generating voronoi polygons for PostGIS using SQL. I was able to put together a procedure which works relatively well, but is somewhat inefficient. Someone on the PostGIS list pointed me to the deldir() function in R, for which I can export a text file with x/y coordinates from a PostGIS table, and write an SQL script with insert statements containing PostGIS ewkt-compatible geometries representing the voronoi polygons generated by deldir(). The script I'm using is below. I've added a very small frac parameter to ensure none of the points are dropped from my dataset (which was actually happening for me with some fairly dispersed clusters of points), and a rectangular window that is much larger than the dataset itself to ensure that the voronoi polygons extend far enough to actually cover all of the points in my dataset. The problem I'm having is that the vertices at the corners of these polygons seem to have their coordinates rounded to one decimal place. Based on the documentation, the 'digits' option should allow me to change this, but I'm not having getting anything different no matter what I set it to. Is there something obvious that I've missed? I realize that in most practical applications it's not a significant issue, but I'd prefer more accuracy as I may be using voronoi polygons to evaluate some statistical methods. Below are some sample coordinates from my points.txt file, and the script I'm running in R to write out the voronoi polygons. If anyone can see what's going, I'd be happy to hear it, as the R calculations are much faster than the method I'm using within PostGIS/SQL: points.txt: 22.462042329 | 8665540.49905558 270171.836250559 | 8667802.6446983 268895.572741816 | 8674257.75469324 270054.378262961 | 8666483.37597101 268402.641255299 | 8664853.87941629 265707.056272354 | 8665434.09025432 269985.118229025 | 8667743.14071004 269282.034045422 | 8665403.39312076 R library(deldir) points = scan(file=points.txt,what=list(x=0.0,y=0.0),sep=|) voro = deldir(points$x,points$y,digits=10,frac=0.001,rw=c(min(points$x)-abs(min(points$x)-max(points$x)),max(points$x)+abs(min(points$x)-max(points$x)),min(points$y)-abs(min(points$y)-max(points$y)),max(points$y)+abs(min(points$y)-max(points$y # generate voronoi edges tiles = tile.list(voro)# combine edges into polygons sink(voronoi.sql)# redirect output to file for (i in 1:length(tiles)) { # write out polygons tile = tiles[[i]] cat(insert into mytable (the_geom) values(geomfromtext('POLYGON(() for (j in 1:length(tile$x)) { cat (tile$x[[j]],' ',tile$y[[j]],,) } cat (tile$x[[1]],' ',tile$y[[1]]) #close polygon cat ())',32718));\n) # add SRID and newline } sink() # output to terminal q() n Regards, Mike __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Uneven y-axis scale
Kåre Edvardsen wrote: Dear R-gurus! Is it possible within boxplot to break the y-scale into two (or anything)? I'd like to have a normal linear y-range from 0-10 and the next tick mark starting at, say 60 and continue to 90. The reason is for better visualising the outliers. Hi Kare, In the plotrix package, there are two functions, gap.plot amd gap.barplot, that do something like this. If you think that they are sufficiently close to what you want, I could have a try at doing a gap.boxplot function. Jim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] imaging and contouring
Dear R'Helpers and Colleagues, I have looked in the documentation, asked to some colleagues in the lab, but was unable to solve the following difficulty. I am running R on a iMac G5 with OS 10.4. The file below (73 rows x 144 col) shows the values of a climatic index on the globe with a grid of 2,5 ° x 2,5 ° (NA = no value):  With image() and map() and running the following function: dessin_i-function(nomfichier=ERA40NA.dat,lat=73,long=144) { #définition de la fonction library(maps) a - read.table(nomfichier) #lecture du fichier z - array(0,dim=c(long,lat)) #création d'un tableau de 0 z - t(a) #transposition de la matrice a en z z1-rbind(z[65:long,],z[1:64,]) #centrage de la carte sur greenwich nlat-seq(-90,90,2.5) #définition des latitudes / ordonnées de 2,5° en 2,5° nlong - seq(-180,177.5,2.5) #définition des longitudes /abcisses de 2,5° en 2,5° image(nlong,nlat,z1,col=rainbow (100),xlab=longitude,ylab=latitude) #traçage map(add=TRUE) #superposition de la carte des continents et pays } And then: dessin_i() I got the following figure (anonymous ftp): ftp://ftp.lodyc.jussieu.fr/ndelod/ND_Fig1.pdf running the following: dessin_c-function(nomfichier=ERA40NA.dat,lat=73,long=144) { #définition de la fonction library(maps) #chargement du package maps a - read.table(nomfichier) #lecture des données z - array(0,dim=c(long,lat)) #création d'un tableau de 0 z - t(a) #transposition de la matrice a en z z1-rbind(z[65:long,],z[1:64,]) #centrage de la carte sur greenwich nlat-seq(-90,90,2.5) #définition des latitudes / ordonnées de 2,5° en 2,5° nlong - seq(-180,177.5,2.5) #définition des longitudes /abcisses de 2,5° en 2,5° contour(nlong,nlat,z1,col=rainbow (100),xlab=longitude,ylab=latitude) #traçage map(add=TRUE) #superposition de la carte des continents et pays } and: dessin_c() I got the following figure (anonymous ftp): ftp://ftp.lodyc.jussieu.fr/ndelod/ND_Fig2.pdf Finally, running: dessin_f-function(nomfichier=ERA40NA.dat,lat=73,long=144) { #définition de la fonction library(maps) #chargement du package maps a - read.table(nomfichier) #lecture des données z - array(0,dim=c(long,lat)) #création d'un tableau de 0 z - t(a) #transposition de la matrice a en z z1-rbind(z[65:long,],z[1:64,]) #centrage de la carte sur greenwich nlat-seq(-90,90,2.5) #définition des latitudes / ordonnées de 2,5° en 2,5° nlong - seq(-180,177.5,2.5) #définition des longitudes /abcisses de 2,5° en 2,5° filled.contour(nlong,nlat,z1,col=rainbow (100),xlab=longitude,ylab=latitude) #traçage map(add=TRUE) #superposition de la carte des continents et pays } and: dessin_f() I got the following figure (anonymous ftp): ftp://ftp.lodyc.jussieu.fr/ndelod/ND_Fig3.pdf It may be similar to what I am looking for, i.e. the first figure with smoothed contours and a scale of the colors/values but I have been unable (or didn't found the right options) to put the map in the right place. Is it possible and how to do it: 1) to use filled.contour() without the scale? 2) to put again the map and the coloured contours adjusted? 3) to obtain the colours of the first figure with filled.contour function? A subsidiary question is yet: how to obtain levels lines thicker and/ or in other colour only for specified values of z1? Many thanks in advance. Sincerely, Nicolas Degallier UMR 7159 / IRD UR182 Laboratoire d'Océanographie et du Climat, Expérimentation et Approches Numériques (LOCEAN) Tour 45-55, 4e ét., case 100, 4 place Jussieu 75252 Paris Cedex 5 France tél: (33) 01 44 27 51 57 fax: (33) 01 44 27 38 05 E-mail: [EMAIL PROTECTED] pdf reprints (anonymous ftp): ftp://ftp.lodyc.jussieu.fr/ndelod __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] rbind
Hi list, I have been trying to pileup dataframes with different column names using rbind. I am getting the following error message. I was wondering if there is a way to go around this minor problem? Error in match.names(clabs, names(xi)) : names don't match previous names: G Thanks indeed for your information Regards Mahdi -- --- Mahdi Osman (PhD) E-mail: [EMAIL PROTECTED] --- Echte DSL-Flatrate dauerhaft für 0,- Euro*! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] interpreting anova summary tables - newbie
Hello, Apologies if this is the wrong list, I am a first-time poster here. I have an experiment in which an output is measured in response to 42 different categories. I am only interested which of the categories is significantly different from a reference category. Here is the summary of the results: summary(simple.fit) Call: lm(formula = as.numeric(as.vector(TNFa)) ~ Mutant.ID, data = imputed.data) Residuals: Min 1Q Median 3Q Max -238.459 -25.261 -0.868 25.660 309.496 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 49.047910.5971 4.628 5.08e-06 *** Mutant.IDB 149.807023.1632 6.467 3.09e-10 *** Mutant.IDC 98.744323.1632 4.263 2.55e-05 *** Mutant.IDD 97.220323.1632 4.197 3.37e-05 *** Mutant.IDE 118.982023.1632 5.137 4.49e-07 *** Mutant.IDF 241.853723.1632 10.441 2e-16 *** Mutant.IDG 107.488323.1632 4.640 4.80e-06 *** Mutant.IDH 105.766423.1632 4.566 6.74e-06 *** Mutant.IDI 517.465023.1632 22.340 2e-16 *** Mutant.IDJ 19.23.1632 0.854 0.393735 Mutant.IDK 47.424023.1632 2.047 0.041313 * Mutant.IDL3.254223.1632 0.140 0.888347 Mutant.IDM 180.963823.1632 7.813 5.63e-14 *** Mutant.IDN 19.058223.1632 0.823 0.411155 Mutant.IDO 61.868423.1632 2.671 0.007891 ** Mutant.IDP -0.530623.1632 -0.023 0.981738 Mutant.IDQ -10.697223.1632 -0.462 0.644478 Mutant.IDR1.537723.1632 0.066 0.947107 Mutant.IDS 14.633323.1632 0.632 0.527934 Mutant.IDT 48.890023.1632 2.111 0.035458 * Mutant.IDU 58.959723.1632 2.545 0.011313 * Mutant.IDV 81.765723.1632 3.530 0.000467 *** Mutant.IDW 82.957623.1632 3.581 0.000386 *** Mutant.IDY 49.192623.1632 2.124 0.034343 * Mutant.IDZ 51.038123.1632 2.203 0.028170 * Mutant.IDZA 116.048723.1632 5.010 8.38e-07 *** Mutant.IDZB 56.440223.1632 2.437 0.015287 * Mutant.IDZC -14.530523.1632 -0.627 0.530838 Mutant.IDZD -5.006923.1632 -0.216 0.828983 Mutant.IDZE 9.117623.1632 0.394 0.694080 Mutant.IDZF 232.287923.1632 10.028 2e-16 *** Mutant.IDZG -27.167123.1632 -1.173 0.241595 Mutant.IDZH 0.875723.1632 0.038 0.969862 Mutant.IDZI 4.795223.1632 0.207 0.836108 Mutant.IDZJ -5.585923.1632 -0.241 0.809568 Mutant.IDZK -12.926323.1632 -0.558 0.577138 Mutant.IDZL 38.862123.1632 1.678 0.094224 . Mutant.IDZM 39.264323.1632 1.695 0.090880 . Mutant.IDZN 73.841923.1632 3.188 0.001553 ** Mutant.IDZO 147.780423.1632 6.380 5.20e-10 *** Mutant.IDZP 0.565423.1632 0.024 0.980540 Mutant.IDZQ 50.511723.1632 2.181 0.029824 * Mutant.IDZR 217.682423.1632 9.398 2e-16 *** Mutant.IDZS 237.322723.1632 10.246 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 61.79 on 377 degrees of freedom Multiple R-Squared: 0.7351, Adjusted R-squared: 0.7049 F-statistic: 24.33 on 43 and 377 DF, p-value: 2.2e-16 My question relates to the meaning of the p-values. Do the p-values relate to a) the confidence in the estimate or b)the confidence that the non-intercept categories are different to the intercept Somebody mentioned to me that the p-value for the intercept is the confidence in the estimate of the intercept, whereas the remaining entries are the confidence in each strain being different from the reference / intercept Note the contrasts setting is contr.treatment. Any help would be appreciated Andrew McDonagh, PhD Candidate, Department of Infectious Diseases, Commonwealth Building, Hammersmith Hospital, Du Cane Road, London W12 ONN [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Import .scv data query
Hi, I am having troubles importing data from an scv file in R. This is my current code: returns - read.zoo(empty143.csv, format = %d/%b/%Y, sep = ,, header = TRUE, skip=1 ) for importing data of this type: Request4*, ,fx.TWD.USD.SPOT.soho 01/Jan/1988,0.03502627 04/Jan/1988,0.03502627 05/Jan/1988,0.03502627 06/Jan/1988,0.035038542 the error is: Error in read.zoo(empty143.csv, format = %d/%b/%Y, sep = ,, header = TRUE, : index contains NAs I guess my syntax is just wrong The next code lines would be: returns - mreturns - window(returns, start = as.Date(01/Jan/1988)) returns - window(returns, end = as.Date(08/Jun/1988)) retdat - as.data.frame(returns) attach(retdat) retdat Any help would be more than greatly appreciated. Cheers, NIco This message and any attachments (the message) is\ intende...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Problems in package management after Linux system upgrade
On 05/04/06, Paul Johnson [EMAIL PROTECTED] wrote: Hi Paul, I upgraded from Fedora Core 4 to Fedora Core 5 and I find a lot of previously installed packages won't run because shared libraries or other system things have changed out from under the installed R libraries. I do not know for sure if the R version now from Fedora-Extras (2.2.1) is exactly the same one I was using in FC4. It is. I expect also that as soon as 2.3.0 is released it will become available for FC-4, Fc-5 and devel. I see problems in many packages. Example, Hmisc: unable to load shared library '/usr/lib/R/library/Hmisc/libs/Hmisc.so': libgfortran.so.0: cannot open shared object file: No such file or directory Error in library(Hmisc) : .First.lib failed for 'Hmisc' To avoid this problem I am submiting R pakages to Extras. I maintain some and I intend to submit more, I have posted here a python script to convert an R package into a src.rpm that follows Extras packaging policy. All help is welcome. :-) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- José Abílio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] rbind
Hi if you are really sure that all columns in all data frames you want to stack are the same type and in the same position just make names in all data frames equal. df1-data.frame(rnorm(10), rnorm(10)) df2-data.frame(rnorm(10), rnorm(10)) names(df2)-c(a, b) df1 rnorm.10. rnorm.10..1 1 -0.1645236 0.3981059 2 -0.2533617 -0.6120264 ... df2 ab 1 2.40161776 0.475509529 2 -0.03924000 -0.709946431 ... rbind(df1,df2) Error in match.names(clabs, names(xi)) : names don't match previous names: a, b names(df1)-names(df2) rbind(df1,df2) ab 1 -0.16452360 0.398105880 2 -0.25336168 -0.612026393 30.69696338 0.341119691 40.55666320 -1.129363096 5 -0.68875569 1.433023702 ... HTH Petr On 6 Apr 2006 at 9:46, Mahdi Osman wrote: Date sent: Thu, 6 Apr 2006 09:46:56 +0200 (MEST) From: Mahdi Osman [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Subject:[R] rbind Hi list, I have been trying to pileup dataframes with different column names using rbind. I am getting the following error message. I was wondering if there is a way to go around this minor problem? Error in match.names(clabs, names(xi)) : names don't match previous names: G Thanks indeed for your information Regards Mahdi -- --- Mahdi Osman (PhD) E-mail: [EMAIL PROTECTED] --- Echte DSL-Flatrate dauerhaft für 0,- Euro*! Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] skipping rows in trellis key
Hi, Steve, I think you can trick it by adding an empty level with a small cex: keyArgs - list(points=list(pch=c(NA,rep(17,5),NA),lwd=2, col=c(NA,c(red, chartreuse3, black, cyan, blue)),NA), text=list(lab=c(S-R Mapping,Color,Shape,Letter,Compatible,Incompatible,),cex=c(1.2,1,1,1,1,1,.2)), text=list(lab=c(expression(R^2),as.character(rep(0.999,5)),),cex=c(1.2,1,1,1,1,1,.2))) HTH, --sundar Steven Lacey wrote: Try this... xyplot(y~x,data=data.frame(x=1:10,y=1:10)) keyArgs - list() keyArgs - list(points=list(pch=c(NA,rep(17,5)),lwd=2,col=c(NA,c(red, chartreuse3, black, cyan, blue))), text=list(lab=c(S-R Mapping, Color,Shape,Letter,Compatible,Incompatible),cex=c(1.2,1,1,1,1,1)), text=list(lab=c(expression(R^2),as.character(rep(0.999,5))),cex=c(1.2,1,1,1, 1,1))) keyArgs$between - c(1,0,5) keyArgs$columns - 1 keyArgs$column.between - 1 keyArgs$border - TRUE drawKeyArgs - list(key=keyArgs,draw=FALSE) keyArgs$draw - FALSE key - do.call(draw.key,drawKeyArgs) vp1-viewport(x=1,y=1,height=unit(1,grobheight,key),width=unit(1,grobwidt h,key),just=c(right,top)) pushViewport(vp1) grid.draw(key) popViewport() Steve -Original Message- From: Deepayan Sarkar [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 05, 2006 11:53 PM To: Steven Lacey Cc: r-help@stat.math.ethz.ch Subject: Re: [R] skipping rows in trellis key On 4/5/06, Steven Lacey [EMAIL PROTECTED] wrote: Yes, that works! Thanks! On a related note... When I draw my key there is not sufficient padding between the last line of the key and the border line. For instance, the vertical line in p often drops below the border. Is there an easy way to add padding between the last row of the key and the border? Example please. Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] pelora extension to continuous response problem
Hi, I'm using Mark Dettling's supclust package. I have a typical numeric matrix x of explanatory variables (n cases, p features, n p). I have a numeric vector y of length n, containing a quantitative response variable. I am interested in grouping features in a way which is strongly predictive of response y (like in PLS approach). In paper describing Pelora it is said that Pelora can be easily adapted to this case. However it seems to me that no explicit indication is given in supclust user manual. Should I simply call aa- pelora(x_training,y) and than predict(aa, x_test, type=probs) ? Thank You in advance rossana [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Help on hypothesis testing
Hi, I hope to use R to perform some hypothesis testing, such as t.test() and var.test(). However, I don't find a function that could do test on means with variance known (i.e., u test or z test in some textbook), and a function that could do test on variances of normal distribution (i.e. chi-squared test). Thanks in advance for any hints. Best wishes, Jinsong __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] polynomial predict with lme
Does lme prediction work correctly with poly() terms? In the following simulated example, the predictions are wildly off. Or am I doing something daft? Milk yield for five cows is measured weekly for 45 weeks. Yield is simulated as cubic function of weekno + random cow effect (on intercept) + residual error. I want to recover an estimate of the fixed curve. ### library(nlme) set.seed(1) ncows - 5; nweeks - 45; week - 1:nweeks mcurve - 25 + 0.819*week - 0.0588*week^2 + 0.000686*week^3 cow.eff - rnorm(ncows) week - rep(week, ncows) cow - gl(ncows,nweeks) yield - mcurve + cow.eff[cow] + rnorm(ncows*nweeks) lmefit - lme(yield ~ poly(week,3), random = ~1|cow) summary(lmefit) # seems OK someweeks - seq(5,45,by=5) new - data.frame(week=someweeks) predicts - predict(lmefit, new, level=0) print(predicts) # not even close #plot(week, yield, las=1) #lines(someweeks, predicts) ### -- *I.White * *University of Edinburgh * *Ashworth Laboratories, West Mains Road* *Edinburgh EH9 3JT * *Fax: 0131 650 6564 Tel: 0131 650 5490 * *E-mail: [EMAIL PROTECTED] * __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help on hypothesis testing
On 4/6/06 7:17 AM, Jinsong Zhao [EMAIL PROTECTED] wrote: Hi, I hope to use R to perform some hypothesis testing, such as t.test() and var.test(). However, I don't find a function that could do test on means with variance known (i.e., u test or z test in some textbook), and a function that could do test on variances of normal distribution (i.e. chi-squared test). See ?pnorm and ?pchisq, if I understand you correctly. Sean __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Sort Problem
Hi All, I have certain combinations for which I have some value, e.g. 0 1 20 0 2 15 1 1 40 1 2 52 Now I need to sort this list for which I'll get the combination against the lowest value. In this case, 15, and the combination will be 0 2. -- SUMANTA BASAK. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Sort Problem
On 4/6/06 8:04 AM, Sumanta Basak [EMAIL PROTECTED] wrote: Hi All, I have certain combinations for which I have some value, e.g. 0 1 20 0 2 15 1 1 40 1 2 52 Now I need to sort this list for which I'll get the combination against the lowest value. In this case, 15, and the combination will be 0 2. See ?order. You can use the ordering that you get from order() to grab the appropriate entry in your matrix/data.frame. Sean __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] polynomial predict with lme
I don't believe predict.lme implements makepredictcall (which is the magic used in normal model-fitting), nor does it use model.frame.default. So the answer would appear to be that there is no reason to expect it to work with poly(). See http://developer.r-project.org/model-fitting-functions.txt for the standard paradigm. On Thu, 6 Apr 2006, i.m.s.white wrote: Does lme prediction work correctly with poly() terms? In the following simulated example, the predictions are wildly off. Or am I doing something daft? Milk yield for five cows is measured weekly for 45 weeks. Yield is simulated as cubic function of weekno + random cow effect (on intercept) + residual error. I want to recover an estimate of the fixed curve. ### library(nlme) set.seed(1) ncows - 5; nweeks - 45; week - 1:nweeks mcurve - 25 + 0.819*week - 0.0588*week^2 + 0.000686*week^3 cow.eff - rnorm(ncows) week - rep(week, ncows) cow - gl(ncows,nweeks) yield - mcurve + cow.eff[cow] + rnorm(ncows*nweeks) lmefit - lme(yield ~ poly(week,3), random = ~1|cow) summary(lmefit) # seems OK someweeks - seq(5,45,by=5) new - data.frame(week=someweeks) predicts - predict(lmefit, new, level=0) print(predicts) # not even close #plot(week, yield, las=1) #lines(someweeks, predicts) ### -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] weighted kernel density estimate
Dear R-users, I intend to do a spatial analysis on the genetic structuring within a population. For this I had thought to prepare a kernel density estimate map showing the spatial distribution of individuals, while incorporating the genetic distances among individuals. I have a dataset of locations of N unique individuals (XY-coordinates) and an NxN matrix with the genetic distances given as a fraction between 0 and 1. As far as I understand the methodology, a kernel density estimate works with the geographic distance matrix. My idea was to somehow incorporate the genetic distance matrix (e.g. as an among-individual-based smoothing factor???) in the estimation. Does anyone know if this is possible? To me it sounds a logical inclusion which may be interesting for a wide variety of topics (i.e., not all individuals are equal). I hope someone knows of any way to proceed. Thanks in advance, Cheers Roel May Roel May Norwegian Institute for Nature Research (NINA) Tungasletta 2, NO-7485 Trondheim, Norway Tlf. +47 73 80 14 65, Mob. +47 95 78 59 95 Email [EMAIL PROTECTED] Internett www.nina.no, www.jerv.info __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Sort Problem
From: Sean Davis On 4/6/06 8:04 AM, Sumanta Basak [EMAIL PROTECTED] wrote: Hi All, I have certain combinations for which I have some value, e.g. 0 1 20 0 2 15 1 1 40 1 2 52 Now I need to sort this list for which I'll get the combination against the lowest value. In this case, 15, and the combination will be 0 2. See ?order. You can use the ordering that you get from order() to grab the appropriate entry in your matrix/data.frame. Sean If Sumanta just wants the entry with the minimum, there's no need to sort the whole thing. which.min() should work. Andy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Uneven y-axis scale
It is useful to me. Thank you! ask the other question. I have some data with many points in a smaller scales (e.g. 0-10) and little points in a larger scales (e.g, 10-1000).If I want to a normal linear y-scale from 0-10 and let the other points (10-1000) display in a shorter distance. Can I do it by using the package? How to do it? Thanks! On 4/6/06, Jim Lemon [EMAIL PROTECTED] wrote: Kåre Edvardsen wrote: Dear R-gurus! Is it possible within boxplot to break the y-scale into two (or anything)? I'd like to have a normal linear y-range from 0-10 and the next tick mark starting at, say 60 and continue to 90. The reason is for better visualising the outliers. Hi Kare, In the plotrix package, there are two functions, gap.plot amd gap.barplot, that do something like this. If you think that they are sufficiently close to what you want, I could have a try at doing a gap.boxplot function. Jim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] weighted kernel density estimate
If you don't insist on kernel smoothing, and are willing to use something similar, locfit() in the locfit package uses local likelihood to estimate density and can accept weights. E.g., library(locfit) plot(locfit(~Petal.Length + Petal.Width, data=iris)) plot(locfit(~Petal.Length + Petal.Width, data=iris, weights=rep(1:3, each=50))) Andy From: May, Roel Dear R-users, I intend to do a spatial analysis on the genetic structuring within a population. For this I had thought to prepare a kernel density estimate map showing the spatial distribution of individuals, while incorporating the genetic distances among individuals. I have a dataset of locations of N unique individuals (XY-coordinates) and an NxN matrix with the genetic distances given as a fraction between 0 and 1. As far as I understand the methodology, a kernel density estimate works with the geographic distance matrix. My idea was to somehow incorporate the genetic distance matrix (e.g. as an among-individual-based smoothing factor???) in the estimation. Does anyone know if this is possible? To me it sounds a logical inclusion which may be interesting for a wide variety of topics (i.e., not all individuals are equal). I hope someone knows of any way to proceed. Thanks in advance, Cheers Roel May Roel May Norwegian Institute for Nature Research (NINA) Tungasletta 2, NO-7485 Trondheim, Norway Tlf. +47 73 80 14 65, Mob. +47 95 78 59 95 Email [EMAIL PROTECTED] Internett www.nina.no, www.jerv.info __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] key position in trellis plotting area
Hi, I want to do the following: 1) create a trellis plot with 1 x 1 layout 2) add a key in the upper right hand corner of the plotting region (i.e., the panel), but after the initial call to trellis 3) resize the resulting device graphics device without changing the relative position of the key For instance, the code below draws the key relative to the device window--not the plotting area. xyplot(y~x,data=data.frame(x=1:10,y=1:10),par.setting=list(background=white ),col=black) keyArgs - list() keyArgs - list(points=list(pch=17,lwd=2,col=c(transparent,red, chartreuse3, black, cyan, blue,transparent)), text=list(lab=c(S-R Mapping, Color,Shape,Letter,Compatible,Incompatible,),cex=c(1.2,1,1,1,1,1 ,0.2)), text=list(lab=c(expression(R^2),as.character(rep(0.999,5)),),cex=c(1.2,1,1 ,1,1,1,0.2))) keyArgs$between - c(1,0,5) keyArgs$background - white keyArgs$border - TRUE drawKeyArgs - list(key=keyArgs,draw=FALSE) keyArgs$draw - FALSE key - do.call(draw.key,drawKeyArgs) vp1-viewport(x=1,y=1,height=unit(1,grobheight,key),width=unit(1,grobwidt h,key),just=c(right,top)) pushViewport(vp1) grid.draw(key) popViewport() If I embed the call to draw.key into the panel function, where presumably the plotting area is the current viewport, then it works. panel.test - function(x,y,...){ panel.xyplot(x,y,...) keyArgs - list() keyArgs - list(points=list(pch=17,lwd=2,col=c(transparent,red, chartreuse3, black, cyan, blue,transparent)), text=list(lab=c(S-R Mapping, Color,Shape,Letter,Compatible,Incompatible,),cex=c(1.2,1,1,1,1,1 ,0.2)), text=list(lab=c(expression(R^2),as.character(rep(0.999,5)),),cex=c(1.2,1,1 ,1,1,1,0.2))) keyArgs$between - c(1,0,5) keyArgs$background - white keyArgs$border - TRUE drawKeyArgs - list(key=keyArgs,draw=FALSE) keyArgs$draw - FALSE key - do.call(draw.key,drawKeyArgs) vp1-viewport(x=1,y=1,height=unit(1,grobheight,key),width=unit(1,grobwidt h,key),just=c(right,top)) pushViewport(vp1) grid.draw(key) popViewport() } xyplot(y~x,data=data.frame(x=1:10,y=1:10),par.setting=list(background=white ),col=black,panel=panel.test) But, I don't want to write panel functions that are specific to the key. I'd like to draw the plot, access the plotting area viewport, push it, and add the key. Something may be possible with baseViewports, but I can't get that to work properly with a 1 x 1 lattice plot, and the relative position of the key changes when the device is rescaled. Is there an easy way to do this in the initial call to xyplot? My read of the documentation is that a key may be positioned in the npc coordinates of the device region, but not the plotting region. Is that correct? Thanks, Steve [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] weighted kernel density estimate
On Thu, 2006-04-06 at 14:29 +0200, May, Roel wrote: Dear R-users, I intend to do a spatial analysis on the genetic structuring within a population. For this I had thought to prepare a kernel density estimate map showing the spatial distribution of individuals, while incorporating the genetic distances among individuals. I have a dataset of locations of N unique individuals (XY-coordinates) and an NxN matrix with the genetic distances given as a fraction between 0 and 1. As far as I understand the methodology, a kernel density estimate works with the geographic distance matrix. My idea was to somehow incorporate the genetic distance matrix (e.g. as an among-individual-based smoothing factor???) in the estimation. Does anyone know if this is possible? To me it sounds a logical inclusion which may be interesting for a wide variety of topics (i.e., not all individuals are equal). I hope someone knows of any way to proceed. Thanks in advance, Dear Roel -- it is not entirely clear what you wish to achieve. Sampling weights associated with a unit can be incorporated at least in the locfit package, which also allows you to do a 2-dimensional density estimate, but this does not sound like what you are interested in. From your description, it sounds like you have a data set which consists of 1/2*N*(N-1) unique pairs of individuals with data x1 y1 x2 y2 w where (x1, y1) and (x2, y2) are the (x,y) coordinates and w is the genetic distance between the two (there are only 1/2*N*(N-1) on the assumption that the genetic distance for any i,j individuals is symmetric so w(i,j) = w(j, i) and w(i, i) = 1). Are you interested in plotting on a map of (x, y) coordinates some measure of how genetically related the population at the that coordinate are with their surroundinds? I.e., value at x1, y1 of w summarised by a kernel function over all x2, y2? Best wishes, Markus Cheers Roel May Roel May Norwegian Institute for Nature Research (NINA) Tungasletta 2, NO-7485 Trondheim, Norway Tlf. +47 73 80 14 65, Mob. +47 95 78 59 95 Email [EMAIL PROTECTED] Internett www.nina.no, www.jerv.info __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Markus Jantti Abo Akademi University [EMAIL PROTECTED] http://www.iki.fi/~mjantti ### This message has been scanned by F-Secure Anti-Virus for Mic...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] R CMD check for packages in a bundle
Hi [MacOsX 10.4.6; R-2.2.1] I have a bundle that comprises three packages. I want to run R CMD check on each one individually, as it takes a long time to run on all three. I am having problems. The bundle as a whole passes R CMD check, but fails when I cd to the bundle directory and run R CMD check on a package directory. The whole bundle passes: octopus:~/scratch% R CMD check ./BACCO * checking for working latex ... OK * using log directory '/Users/rksh/scratch/BACCO.Rcheck' * using R version 2.2.1, 2005-12-20 * checking for file 'BACCO/DESCRIPTION' ... OK * looks like 'BACCO' is a package bundle * this is bundle 'BACCO' version '1.0-29' * checking if this is a source bundle ... OK [snip] ** creating approximator-manual.tex ... OK ** checking approximator-manual.tex ... OK octopus:~/scratch% but R CMD check fails on the packages individually: octopus:~/scratch/BACCO% R CMD check ./calibrator * checking for working latex ... OK * using log directory '/Users/rksh/scratch/BACCO/calibrator.Rcheck' * using R version 2.2.1, 2005-12-20 * checking for file 'calibrator/DESCRIPTION' ... OK * looks like 'calibrator' is a package bundle * this is bundle 'BACCO' version '1.0-29' * checking if this is a source bundle ... OK /Library/Frameworks/R.framework/Resources/bin/INSTALL: line 1: cd: / Users/rksh/scratch/BACCO/calibrator/emulator: No such file or directory /Library/Frameworks/R.framework/Resources/bin/INSTALL: line 1: cd: / Users/rksh/scratch/BACCO/calibrator/calibrator: No such file or directory /Library/Frameworks/R.framework/Resources/bin/INSTALL: line 1: cd: / Users/rksh/scratch/BACCO/calibrator/approximator: No such file or directory ERROR: no 'Package' field in 'DESCRIPTION' ERROR Installation failed. octopus:~/scratch/BACCO% Both DESCRIPTION and DESCRIPTION.in files seem to be OK: octopus:~/scratch/BACCO/calibrator% cat DESCRIPTION Package: calibrator Description: Performs Bayesian calibration of computer models as per [snip] Title: Bayesian calibration of computer models Bundle: BACCO Version: 1.0-29 Date: 13 October 2005 Depends: R (= 2.0.0), mvtnorm, adapt Contains: emulator calibrator approximator Title: Bayesian Analysis of Computer Code Output BundleDescription: Bayesian analysis of computer code software. The bundle contains routines that evaluate the formulae of Kennedy, O'Hagan, and Oakley. License: GPL Author: Robin K. S. Hankin [EMAIL PROTECTED] Maintainer: Robin K. S. Hankin [EMAIL PROTECTED] URL: http://www.noc.soton.ac.uk/JRD/SAT/pers/rksh.html octopus:~/scratch/BACCO/calibrator% cat DESCRIPTION.in Package: calibrator Description: Performs Bayesian calibration of computer models as per [snip] Title: Bayesian calibration of computer models -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] weighted kernel density estimate
Dear Markus, I indeed have a data set consisting of 1/2*N*(N-1) unique pairs of individuals with data x1 y1 x2 y2 w I am however not interested in, like you said, the value at x1, y1 of w summarised by a kernel function over all x2, y2 (if I understand you rightly that is...). This sounds like doing a weighted kernel density estimate as seen from the 'viewpoint' of a certain location/individual. I know could be done with sm.density in the sm library. I am interested to create a map depicted the total structuring in the entire population (both based on geographic and genetic distances). This means that the evaluation/interpolation at each location have to take into account both the geographic AND the genetic distance matrix (both with 1/2*N*(N-1) unique combinations). To clarify myself a bit more, such a map could show for example differentiation in the population (of wolverines by the way) because of large geographic distances OR because of large genetic distances. I have quickly checked locfit but I am not sure if this would work for me. Roel -Original Message- From: Markus Jantti [mailto:[EMAIL PROTECTED] Sent: 6. april 2006 14:56 To: r-help@stat.math.ethz.ch Cc: May, Roel Subject: Re: [R] weighted kernel density estimate On Thu, 2006-04-06 at 14:29 +0200, May, Roel wrote: Dear R-users, I intend to do a spatial analysis on the genetic structuring within a population. For this I had thought to prepare a kernel density estimate map showing the spatial distribution of individuals, while incorporating the genetic distances among individuals. I have a dataset of locations of N unique individuals (XY-coordinates) and an NxN matrix with the genetic distances given as a fraction between 0 and 1. As far as I understand the methodology, a kernel density estimate works with the geographic distance matrix. My idea was to somehow incorporate the genetic distance matrix (e.g. as an among-individual-based smoothing factor???) in the estimation. Does anyone know if this is possible? To me it sounds a logical inclusion which may be interesting for a wide variety of topics (i.e., not all individuals are equal). I hope someone knows of any way to proceed. Thanks in advance, Dear Roel -- it is not entirely clear what you wish to achieve. Sampling weights associated with a unit can be incorporated at least in the locfit package, which also allows you to do a 2-dimensional density estimate, but this does not sound like what you are interested in. From your description, it sounds like you have a data set which consists of 1/2*N*(N-1) unique pairs of individuals with data x1 y1 x2 y2 w where (x1, y1) and (x2, y2) are the (x,y) coordinates and w is the genetic distance between the two (there are only 1/2*N*(N-1) on the assumption that the genetic distance for any i,j individuals is symmetric so w(i,j) = w(j, i) and w(i, i) = 1). Are you interested in plotting on a map of (x, y) coordinates some measure of how genetically related the population at the that coordinate are with their surroundinds? I.e., value at x1, y1 of w summarised by a kernel function over all x2, y2? Best wishes, Markus Cheers Roel May Roel May Norwegian Institute for Nature Research (NINA) Tungasletta 2, NO-7485 Trondheim, Norway Tlf. +47 73 80 14 65, Mob. +47 95 78 59 95 Email [EMAIL PROTECTED] Internett www.nina.no, www.jerv.info __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Markus Jantti Abo Akademi University [EMAIL PROTECTED] http://www.iki.fi/~mjantti ### This message has been scanned by F-Secure Anti-Virus for Mic...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] convert a data frame to matrix - changed column name
I have a question, which very easy to solve, but I can't find a solution. I want to convert a data frame to matrix. Here my toy example: L3 - c(1:3) L10 - c(1:6) d - data.frame(cbind(x=c(10,20), y=L10), fac=sample(L3, + 6, repl=TRUE)) d x y fac 1 10 1 1 2 20 2 1 3 10 3 1 4 20 4 3 5 10 5 2 6 20 6 2 is.data.frame(d) [1] TRUE sapply(d, function(x) unlist(x, use.names=FALSE)) x y fac [1,] 10 1 1 [2,] 20 2 1 [3,] 10 3 1 [4,] 20 4 3 [5,] 10 5 2 [6,] 20 6 2 is.matrix(sapply(d, function(x) unlist(x, use.names=FALSE))) [1] TRUE Yes, I get a matrix TRUE. But I need to change a column name like [,1] [,2] [,3]. I need the result like [,1] [,2] [,3] [1,] 1011 [2,] 2021 [3,] 1031 [4,] 2043 [5,] 1052 [6,] 2062 How can I do that? Thanks in advance, Muhammad Subianto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] convert a data frame to matrix - changed column name
Hi set the column names to NULL: a - data.frame(x=1:4,y=4:1) aa - as.matrix(a) colnames(aa) - NULL aa [,1] [,2] 114 223 332 441 best wishes Robin On 6 Apr 2006, at 15:16, Muhammad Subianto wrote: I have a question, which very easy to solve, but I can't find a solution. I want to convert a data frame to matrix. Here my toy example: L3 - c(1:3) L10 - c(1:6) d - data.frame(cbind(x=c(10,20), y=L10), fac=sample(L3, + 6, repl=TRUE)) d x y fac 1 10 1 1 2 20 2 1 3 10 3 1 4 20 4 3 5 10 5 2 6 20 6 2 is.data.frame(d) [1] TRUE sapply(d, function(x) unlist(x, use.names=FALSE)) x y fac [1,] 10 1 1 [2,] 20 2 1 [3,] 10 3 1 [4,] 20 4 3 [5,] 10 5 2 [6,] 20 6 2 is.matrix(sapply(d, function(x) unlist(x, use.names=FALSE))) [1] TRUE Yes, I get a matrix TRUE. But I need to change a column name like [,1] [,2] [,3]. I need the result like [,1] [,2] [,3] [1,] 1011 [2,] 2021 [3,] 1031 [4,] 2043 [5,] 1052 [6,] 2062 How can I do that? Thanks in advance, Muhammad Subianto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting- guide.html -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] convert a data frame to matrix - changed column name
try the following: out - data.matrix(d) dimnames(out) - NULL out Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: Muhammad Subianto [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Thursday, April 06, 2006 4:16 PM Subject: [R] convert a data frame to matrix - changed column name I have a question, which very easy to solve, but I can't find a solution. I want to convert a data frame to matrix. Here my toy example: L3 - c(1:3) L10 - c(1:6) d - data.frame(cbind(x=c(10,20), y=L10), fac=sample(L3, + 6, repl=TRUE)) d x y fac 1 10 1 1 2 20 2 1 3 10 3 1 4 20 4 3 5 10 5 2 6 20 6 2 is.data.frame(d) [1] TRUE sapply(d, function(x) unlist(x, use.names=FALSE)) x y fac [1,] 10 1 1 [2,] 20 2 1 [3,] 10 3 1 [4,] 20 4 3 [5,] 10 5 2 [6,] 20 6 2 is.matrix(sapply(d, function(x) unlist(x, use.names=FALSE))) [1] TRUE Yes, I get a matrix TRUE. But I need to change a column name like [,1] [,2] [,3]. I need the result like [,1] [,2] [,3] [1,] 1011 [2,] 2021 [3,] 1031 [4,] 2043 [5,] 1052 [6,] 2062 How can I do that? Thanks in advance, Muhammad Subianto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] convert a data frame to matrix - changed column name
On this day 06/04/2006 16:22, Robin Hankin wrote: Hi set the column names to NULL: a - data.frame(x=1:4,y=4:1) aa - as.matrix(a) colnames(aa) - NULL aa On this day 06/04/2006 16:28, Dimitris Rizopoulos wrote: try the following: out - data.matrix(d) dimnames(out) - NULL out Thank you very much for your help. Best, Muhammad Subianto On 4/6/06, Muhammad Subianto [EMAIL PROTECTED] wrote: I have a question, which very easy to solve, but I can't find a solution. I want to convert a data frame to matrix. Here my toy example: L3 - c(1:3) L10 - c(1:6) d - data.frame(cbind(x=c(10,20), y=L10), fac=sample(L3, + 6, repl=TRUE)) d x y fac 1 10 1 1 2 20 2 1 3 10 3 1 4 20 4 3 5 10 5 2 6 20 6 2 is.data.frame(d) [1] TRUE sapply(d, function(x) unlist(x, use.names=FALSE)) x y fac [1,] 10 1 1 [2,] 20 2 1 [3,] 10 3 1 [4,] 20 4 3 [5,] 10 5 2 [6,] 20 6 2 is.matrix(sapply(d, function(x) unlist(x, use.names=FALSE))) [1] TRUE Yes, I get a matrix TRUE. But I need to change a column name like [,1] [,2] [,3]. I need the result like [,1] [,2] [,3] [1,] 1011 [2,] 2021 [3,] 1031 [4,] 2043 [5,] 1052 [6,] 2062 How can I do that? Thanks in advance, Muhammad Subianto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] New behavior in estimable(), bug or feature?
Hello all! w2k, R-2.2.1. update.packages done today. I just started to work with a new dataset, using lme() (library nlme) and estimable() (library gmodels). I first wanted to establish the fixed effects for eight fertiliser treatments (variable treat) coded as A to H. Fitting and reducing a first model for grain yield went smoothly. When I wanted to look at the fixed effects with estimable() I got an error message claiming that I was using the wrong variable names, estimable wanted the variable names in the form usually given by fixed.effects(). I changed the names in my matrix to the requested form, but got the same error message. Then I changed to an old library with a variety trial material using a matrix I _know_ worked two months ago, I got the same type of error message. What is happening? Is there a new namecheck in estimable acting a little over-enthusiastic or what? Here are some output from R: trMat1 A 1 0 0 0 0 0 0 0 B 0 1 0 0 0 0 0 0 C 0 0 1 0 0 0 0 0 D 0 0 0 1 0 0 0 0 E 0 0 0 0 1 0 0 0 F 0 0 0 0 0 1 0 0 G 0 0 0 0 0 0 1 0 H 0 0 0 0 0 0 0 1 formula(m4.y) y.dm ~ treat - 1 fixed.effects(m4.y) treatA treatB treatC treatD treatE treatF treatG treatH 2065.267 4052.033 4571.479 4933.026 4680.980 5021.347 5063.306 5198.153 estimable(m4.y, trMat1) Error in FUN(newX[, i], ...) : Invalid parameter name(s): , , , , , , , Valid names are: treatA, treatB, treatC, treatD, treatE, treatF, treatG, treatH All the best /CG -- CG Pettersson, MSci, PhD Stud. Swedish University of Agricultural Sciences (SLU) Dep. of Crop Production Ekology. Box 7043. SE-750 07 Uppsala, Sweden [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Multivariate linear regression
Apparently you do not understand the point, and seem to (want to) see patterns all over the place. A good start for the treatment of this interesting disease is 'Fooled by Randomness' by Nassim Nicholas Taleb. The main point of the book is that many things may be a lot more random than one might care to imagine or believe. (Ramsey theory is misleading and of no help here, given its biased premise that complete disorder is impossible (T. S. Motzkin, Wikipedia).) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nagu Sent: Wednesday, April 05, 2006 8:09 PM To: Berton Gunter Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Multivariate linear regression Hi Bert, Thank you for your prompt reply. I understand your point. But randomness is just a matter of scale of the object (Ramsey Theory) . The X matrix does not explain the complete variation in Y due to a large noise in X or simply the mapping f: X-Y is many valued (or due to other finite number of reasons). Theoretically inverse does not exist for many valued functions. In regression type problems, we are evaluating the pseudoinverse of data space. To estimate the inverses of many valued functions, theoretically, we may have to use branch cuts method or something called Riemann surfaces, they are partition of the domain of connected sheets. As I am not a qualified statistician or have a good experience in building statistical models for highly noisy data, I am wondering how did you deal with such situations, if any exist, in your working experience? I will try your idea of feeding some random variables as predictors in X. Thank you again, Nagu P.S. Why is that pattern recognition is all about finding patterns that can not be seen easily, huh? On 4/5/06, Berton Gunter [EMAIL PROTECTED] wrote: Ummm... If y is unrelated to x, then why would one expect any reasonable method to show a greater or lesser relationship than any other? It's all random. Of course, put enough random regressors into/tune the parameters enough of any regression methodology and you'll be able to precisely predict the data at hand -- but **only** the data at hand. I should note that such work apparently frequently appears in various sorts of informatics/data mining/omics/etc. journals these days, as various papers demonstrating the irreproducibility of numerous purported discoveries have infamously demonstrated. Let us not forget Occam! Just being cranky ... -- Bert Gunter -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nagu Sent: Wednesday, April 05, 2006 3:52 PM To: r-help@stat.math.ethz.ch Subject: [R] Multivariate linear regression Hi, I am working on a multivariate linear regression of the form y = Ax. I am seeing a great dispersion of y w.r.t x. For example, the correlations between y and x are very small, even after using some typical transformations like log, power. I tried with simple linear regression, robust regression and ace and avas package in R (or splus). I didn't see an improvement in the fit and predictions over simple linear regression. (I also tried this with transformed variables) I am sure that some of you came across such data. How did you deal with it? Linear regressions are good for the data like y = x + 0.01Normal(mu,sigma2) i.e. a small noise (data observed in a lab). But linear regressions are bad for large noise, like typical market (or survey) data. Thank you, Nagu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] key position in trellis plotting area
Hi, After posting this I found a collection of functions designed to interact with a trellis plot after it has been created. One such function is trellis.focus, which retrieves the viewport for a specific panel. Calling trellis.focus(name=panel,row=1,column=1,highlight=FALSE) sets the current viewport to the one for the panel in which I want to position the key. Steve -Original Message- From: Steven Lacey [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 8:55 AM To: 'r-help@stat.math.ethz.ch' Subject: key position in trellis plotting area Hi, I want to do the following: 1) create a trellis plot with 1 x 1 layout 2) add a key in the upper right hand corner of the plotting region (i.e., the panel), but after the initial call to trellis 3) resize the resulting device graphics device without changing the relative position of the key For instance, the code below draws the key relative to the device window--not the plotting area. xyplot(y~x,data=data.frame(x=1:10,y=1:10),par.setting=list(background=white ),col=black) keyArgs - list() keyArgs - list(points=list(pch=17,lwd=2,col=c(transparent,red, chartreuse3, black, cyan, blue,transparent)), text=list(lab=c(S-R Mapping, Color,Shape,Letter,Compatible,Incompatible,),cex=c(1.2,1,1,1,1,1 ,0.2)), text=list(lab=c(expression(R^2),as.character(rep(0.999,5)),),cex=c(1.2,1,1 ,1,1,1,0.2))) keyArgs$between - c(1,0,5) keyArgs$background - white keyArgs$border - TRUE drawKeyArgs - list(key=keyArgs,draw=FALSE) keyArgs$draw - FALSE key - do.call(draw.key,drawKeyArgs) vp1-viewport(x=1,y=1,height=unit(1,grobheight,key),width=unit(1,grobwidt h,key),just=c(right,top)) pushViewport(vp1) grid.draw(key) popViewport() If I embed the call to draw.key into the panel function, where presumably the plotting area is the current viewport, then it works. panel.test - function(x,y,...){ panel.xyplot(x,y,...) keyArgs - list() keyArgs - list(points=list(pch=17,lwd=2,col=c(transparent,red, chartreuse3, black, cyan, blue,transparent)), text=list(lab=c(S-R Mapping, Color,Shape,Letter,Compatible,Incompatible,),cex=c(1.2,1,1,1,1,1 ,0.2)), text=list(lab=c(expression(R^2),as.character(rep(0.999,5)),),cex=c(1.2,1,1 ,1,1,1,0.2))) keyArgs$between - c(1,0,5) keyArgs$background - white keyArgs$border - TRUE drawKeyArgs - list(key=keyArgs,draw=FALSE) keyArgs$draw - FALSE key - do.call(draw.key,drawKeyArgs) vp1-viewport(x=1,y=1,height=unit(1,grobheight,key),width=unit(1,grobwidt h,key),just=c(right,top)) pushViewport(vp1) grid.draw(key) popViewport() } xyplot(y~x,data=data.frame(x=1:10,y=1:10),par.setting=list(background=white ),col=black,panel=panel.test) But, I don't want to write panel functions that are specific to the key. I'd like to draw the plot, access the plotting area viewport, push it, and add the key. Something may be possible with baseViewports, but I can't get that to work properly with a 1 x 1 lattice plot, and the relative position of the key changes when the device is rescaled. Is there an easy way to do this in the initial call to xyplot? My read of the documentation is that a key may be positioned in the npc coordinates of the device region, but not the plotting region. Is that correct? Thanks, Steve [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Sorting problem
R-gurus... I've got a 5 column dataframe where I'd like to plot each ID's b against c with b in ascending order (within the same ID). How do I sort b so that the other variables are altered equally? IDab c d 101 1 240 26.7 21.85 101 2 335 21.8 21.85 101 3 1387 26.6 21.85 101 4 877 24.1 21.85 101 5 978 24.3 21.85 101 7 4962 28.3 21.85 101 8 1690 26.1 21.85 101 9 2038 24.4 21.85 102 1 240 53.4 51.65 102 2 533 50.4 51.65 102 3 802 53.3 51.65 102 4 1155 47.0 51.65 102 5 953 47.7 51.65 102 7 5353 47.8 51.65 . . . 115 9 840 34.9 42.01 All the best, Kare __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] imaging and contouring
The help for filled.contour suggests using the plot.axes argument to annotate a plot, try changing the last part of your 3rd example to: filled.contour(nlong,nlat,z1,col=rainbow (100),xlab=longitude,ylab=latitude, plot.axes=map(add=TRUE)) Or filled.contour(nlong,nlat,z1,col=rainbow (100),xlab=longitude,ylab=latitude, plot.axes={box();axis(1);axis(2);map(add=TRUE)}) Hope this helps -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nicolas Degallier Sent: Tuesday, April 04, 2006 8:20 AM To: Ethz. Ch Cc: Roger Bivand Subject: [R] imaging and contouring Dear R'Helpers and Colleagues, I have looked in the documentation, asked to some colleagues in the lab, but was unable to solve the following difficulty. I am running R on a iMac G5 with OS 10.4. The file below (73 rows x 144 col) shows the values of a climatic index on the globe with a grid of 2,5 ° x 2,5 ° (NA = no value): With image() and map() and running the following function: dessin_i-function(nomfichier=ERA40NA.dat,lat=73,long=144) { #définition de la fonction library(maps) a - read.table(nomfichier) #lecture du fichier z - array(0,dim=c(long,lat)) #création d'un tableau de 0 z - t(a) #transposition de la matrice a en z z1-rbind(z[65:long,],z[1:64,]) #centrage de la carte sur greenwich nlat-seq(-90,90,2.5) #définition des latitudes / ordonnées de 2,5° en 2,5° nlong - seq(-180,177.5,2.5) #définition des longitudes /abcisses de 2,5° en 2,5° image(nlong,nlat,z1,col=rainbow (100),xlab=longitude,ylab=latitude) #traçage map(add=TRUE) #superposition de la carte des continents et pays } And then: dessin_i() I got the following figure (anonymous ftp): ftp://ftp.lodyc.jussieu.fr/ndelod/ND_Fig1.pdf running the following: dessin_c-function(nomfichier=ERA40NA.dat,lat=73,long=144) { #définition de la fonction library(maps) #chargement du package maps a - read.table(nomfichier) #lecture des données z - array(0,dim=c(long,lat)) #création d'un tableau de 0 z - t(a) #transposition de la matrice a en z z1-rbind(z[65:long,],z[1:64,]) #centrage de la carte sur greenwich nlat-seq(-90,90,2.5) #définition des latitudes / ordonnées de 2,5° en 2,5° nlong - seq(-180,177.5,2.5) #définition des longitudes /abcisses de 2,5° en 2,5° contour(nlong,nlat,z1,col=rainbow (100),xlab=longitude,ylab=latitude) #traçage map(add=TRUE) #superposition de la carte des continents et pays } and: dessin_c() I got the following figure (anonymous ftp): ftp://ftp.lodyc.jussieu.fr/ndelod/ND_Fig2.pdf Finally, running: dessin_f-function(nomfichier=ERA40NA.dat,lat=73,long=144) { #définition de la fonction library(maps) #chargement du package maps a - read.table(nomfichier) #lecture des données z - array(0,dim=c(long,lat)) #création d'un tableau de 0 z - t(a) #transposition de la matrice a en z z1-rbind(z[65:long,],z[1:64,]) #centrage de la carte sur greenwich nlat-seq(-90,90,2.5) #définition des latitudes / ordonnées de 2,5° en 2,5° nlong - seq(-180,177.5,2.5) #définition des longitudes /abcisses de 2,5° en 2,5° filled.contour(nlong,nlat,z1,col=rainbow (100),xlab=longitude,ylab=latitude) #traçage map(add=TRUE) #superposition de la carte des continents et pays } and: dessin_f() I got the following figure (anonymous ftp): ftp://ftp.lodyc.jussieu.fr/ndelod/ND_Fig3.pdf It may be similar to what I am looking for, i.e. the first figure with smoothed contours and a scale of the colors/values but I have been unable (or didn't found the right options) to put the map in the right place. Is it possible and how to do it: 1) to use filled.contour() without the scale? 2) to put again the map and the coloured contours adjusted? 3) to obtain the colours of the first figure with filled.contour function? A subsidiary question is yet: how to obtain levels lines thicker and/ or in other colour only for specified values of z1? Many thanks in advance. Sincerely, Nicolas Degallier UMR 7159 / IRD UR182 Laboratoire d'Océanographie et du Climat, Expérimentation et Approches Numériques (LOCEAN) Tour 45-55, 4e ét., case 100, 4 place Jussieu 75252 Paris Cedex 5 France tél: (33) 01 44 27 51 57 fax: (33) 01 44 27 38 05 E-mail: [EMAIL PROTECTED] pdf reprints (anonymous ftp): ftp://ftp.lodyc.jussieu.fr/ndelod __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Uneven y-axis scale
Have you thought about using a log scale? Clint BowmanINTERNET: [EMAIL PROTECTED] Air Quality Modeler INTERNET: [EMAIL PROTECTED] Department of Ecology VOICE: (360) 407-6815 PO Box 47600FAX:(360) 407-7534 Olympia, WA 98504-7600 On Thu, 6 Apr 2006, Jim Lemon wrote: Kåre Edvardsen wrote: Dear R-gurus! Is it possible within boxplot to break the y-scale into two (or anything)? I'd like to have a normal linear y-range from 0-10 and the next tick mark starting at, say 60 and continue to 90. The reason is for better visualising the outliers. Hi Kare, In the plotrix package, there are two functions, gap.plot amd gap.barplot, that do something like this. If you think that they are sufficiently close to what you want, I could have a try at doing a gap.boxplot function. Jim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help on hypothesis testing
There is a z.test function in the TeachingDemos Package. Note however that this function is meant for teaching purposes as a stepping stone for students to learn the syntax and output for hypothesis test functions before learning about t tests etc. The z.test function only does one sample tests. If you need more functionality you can look at the code for the function (as well as the code for functions t.test and var.test) to see how to write your own function (or extend one of the above). For simple tests you can compute the test statistic by hand and use the pnorm and pchisq functions. If you want the output in the standard hypothesis test format then you will need to write or extend a function to do that (it's not hard). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare [EMAIL PROTECTED] (801) 408-8111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jinsong Zhao Sent: Thursday, April 06, 2006 5:17 AM To: R-help Subject: [R] Help on hypothesis testing Hi, I hope to use R to perform some hypothesis testing, such as t.test() and var.test(). However, I don't find a function that could do test on means with variance known (i.e., u test or z test in some textbook), and a function that could do test on variances of normal distribution (i.e. chi-squared test). Thanks in advance for any hints. Best wishes, Jinsong __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Sorting problem
Please read the Help files before posting! At the bottom of ?sort you will find a link to order which is the answer to your question. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kåre Edvardsen Sent: Thursday, April 06, 2006 8:01 AM To: R-help Subject: [R] Sorting problem R-gurus... I've got a 5 column dataframe where I'd like to plot each ID's b against c with b in ascending order (within the same ID). How do I sort b so that the other variables are altered equally? IDab c d 101 1 240 26.7 21.85 101 2 335 21.8 21.85 101 3 1387 26.6 21.85 101 4 877 24.1 21.85 101 5 978 24.3 21.85 101 7 4962 28.3 21.85 101 8 1690 26.1 21.85 101 9 2038 24.4 21.85 102 1 240 53.4 51.65 102 2 533 50.4 51.65 102 3 802 53.3 51.65 102 4 1155 47.0 51.65 102 5 953 47.7 51.65 102 7 5353 47.8 51.65 . . . 115 9 840 34.9 42.01 All the best, Kare __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R CMD check for packages in a bundle
I think you have to remove the DESCRIPTION.in file when you check the package stand alone. As I recall, if it exists that is the flag to indicate a bundle. Also beware that the DESCRIPTION file in the stand alone package is not exactly the same as the DESCRIPTION.in file when it is part of the bundle. What you are trying to do does work, I do it all the time, but there are a few details to work out. Because of the above differences I actually generate the DESCRIPTION* files from my Makefile, differently depending on the target. Paul Gilbert Robin Hankin wrote: Hi [MacOsX 10.4.6; R-2.2.1] I have a bundle that comprises three packages. I want to run R CMD check on each one individually, as it takes a long time to run on all three. I am having problems. The bundle as a whole passes R CMD check, but fails when I cd to the bundle directory and run R CMD check on a package directory. The whole bundle passes: octopus:~/scratch% R CMD check ./BACCO * checking for working latex ... OK * using log directory '/Users/rksh/scratch/BACCO.Rcheck' * using R version 2.2.1, 2005-12-20 * checking for file 'BACCO/DESCRIPTION' ... OK * looks like 'BACCO' is a package bundle * this is bundle 'BACCO' version '1.0-29' * checking if this is a source bundle ... OK [snip] ** creating approximator-manual.tex ... OK ** checking approximator-manual.tex ... OK octopus:~/scratch% but R CMD check fails on the packages individually: octopus:~/scratch/BACCO% R CMD check ./calibrator * checking for working latex ... OK * using log directory '/Users/rksh/scratch/BACCO/calibrator.Rcheck' * using R version 2.2.1, 2005-12-20 * checking for file 'calibrator/DESCRIPTION' ... OK * looks like 'calibrator' is a package bundle * this is bundle 'BACCO' version '1.0-29' * checking if this is a source bundle ... OK /Library/Frameworks/R.framework/Resources/bin/INSTALL: line 1: cd: / Users/rksh/scratch/BACCO/calibrator/emulator: No such file or directory /Library/Frameworks/R.framework/Resources/bin/INSTALL: line 1: cd: / Users/rksh/scratch/BACCO/calibrator/calibrator: No such file or directory /Library/Frameworks/R.framework/Resources/bin/INSTALL: line 1: cd: / Users/rksh/scratch/BACCO/calibrator/approximator: No such file or directory ERROR: no 'Package' field in 'DESCRIPTION' ERROR Installation failed. octopus:~/scratch/BACCO% Both DESCRIPTION and DESCRIPTION.in files seem to be OK: octopus:~/scratch/BACCO/calibrator% cat DESCRIPTION Package: calibrator Description: Performs Bayesian calibration of computer models as per [snip] Title: Bayesian calibration of computer models Bundle: BACCO Version: 1.0-29 Date: 13 October 2005 Depends: R (= 2.0.0), mvtnorm, adapt Contains: emulator calibrator approximator Title: Bayesian Analysis of Computer Code Output BundleDescription: Bayesian analysis of computer code software. The bundle contains routines that evaluate the formulae of Kennedy, O'Hagan, and Oakley. License: GPL Author: Robin K. S. Hankin [EMAIL PROTECTED] Maintainer: Robin K. S. Hankin [EMAIL PROTECTED] URL: http://www.noc.soton.ac.uk/JRD/SAT/pers/rksh.html octopus:~/scratch/BACCO/calibrator% cat DESCRIPTION.in Package: calibrator Description: Performs Bayesian calibration of computer models as per [snip] Title: Bayesian calibration of computer models -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html La version française suit le texte anglais. This email may contain privileged and/or confidential inform...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] pros and cons of robust regression? (i.e. rlm vs lm)
Can anyone comment or point me to a discussion of the pros and cons of robust regressions, vs. a more manual approach to trimming outliers and/or normalizing data used in regression analysis? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] interpreting anova summary tables - newbie
On Thu, 6 Apr 2006, Andrew McDonagh wrote: My question relates to the meaning of the p-values. Do the p-values relate to a) the confidence in the estimate or b)the confidence that the non-intercept categories are different to the intercept Both (given a loose interpretation of your words and some assumptions that a newbie is using the default settings of e.g. contrasts). The estimates are of the difference in intercept from the reference category, I guess Mutant.IDA. The p-values refer to the test that the particular (population) difference is zero. However, the p-values are for a test done in isolation. Here you are doing 41 tests (ignoring the intercept), and I think you should be making some allowance for multiple comparisone. See e.g. the package multcomp. My preference (and it is widely shared) would be to present confidence intervals for the estimated differences you are interested in (which might be all the differences from the reference category, or something else as, just guessing, you might be interested to know if there is a difference between IDG and IDH). On Thu, 6 Apr 2006, Andrew McDonagh wrote: Hello, Apologies if this is the wrong list, I am a first-time poster here. I have an experiment in which an output is measured in response to 42 different categories. I am only interested which of the categories is significantly different from a reference category. Here is the summary of the results: summary(simple.fit) Call: lm(formula = as.numeric(as.vector(TNFa)) ~ Mutant.ID, data = imputed.data) Residuals: Min 1Q Median 3Q Max -238.459 -25.261 -0.868 25.660 309.496 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 49.047910.5971 4.628 5.08e-06 *** Mutant.IDB 149.807023.1632 6.467 3.09e-10 *** Mutant.IDC 98.744323.1632 4.263 2.55e-05 *** Mutant.IDD 97.220323.1632 4.197 3.37e-05 *** Mutant.IDE 118.982023.1632 5.137 4.49e-07 *** Mutant.IDF 241.853723.1632 10.441 2e-16 *** Mutant.IDG 107.488323.1632 4.640 4.80e-06 *** Mutant.IDH 105.766423.1632 4.566 6.74e-06 *** Mutant.IDI 517.465023.1632 22.340 2e-16 *** Mutant.IDJ 19.23.1632 0.854 0.393735 Mutant.IDK 47.424023.1632 2.047 0.041313 * Mutant.IDL3.254223.1632 0.140 0.888347 Mutant.IDM 180.963823.1632 7.813 5.63e-14 *** Mutant.IDN 19.058223.1632 0.823 0.411155 Mutant.IDO 61.868423.1632 2.671 0.007891 ** Mutant.IDP -0.530623.1632 -0.023 0.981738 Mutant.IDQ -10.697223.1632 -0.462 0.644478 Mutant.IDR1.537723.1632 0.066 0.947107 Mutant.IDS 14.633323.1632 0.632 0.527934 Mutant.IDT 48.890023.1632 2.111 0.035458 * Mutant.IDU 58.959723.1632 2.545 0.011313 * Mutant.IDV 81.765723.1632 3.530 0.000467 *** Mutant.IDW 82.957623.1632 3.581 0.000386 *** Mutant.IDY 49.192623.1632 2.124 0.034343 * Mutant.IDZ 51.038123.1632 2.203 0.028170 * Mutant.IDZA 116.048723.1632 5.010 8.38e-07 *** Mutant.IDZB 56.440223.1632 2.437 0.015287 * Mutant.IDZC -14.530523.1632 -0.627 0.530838 Mutant.IDZD -5.006923.1632 -0.216 0.828983 Mutant.IDZE 9.117623.1632 0.394 0.694080 Mutant.IDZF 232.287923.1632 10.028 2e-16 *** Mutant.IDZG -27.167123.1632 -1.173 0.241595 Mutant.IDZH 0.875723.1632 0.038 0.969862 Mutant.IDZI 4.795223.1632 0.207 0.836108 Mutant.IDZJ -5.585923.1632 -0.241 0.809568 Mutant.IDZK -12.926323.1632 -0.558 0.577138 Mutant.IDZL 38.862123.1632 1.678 0.094224 . Mutant.IDZM 39.264323.1632 1.695 0.090880 . Mutant.IDZN 73.841923.1632 3.188 0.001553 ** Mutant.IDZO 147.780423.1632 6.380 5.20e-10 *** Mutant.IDZP 0.565423.1632 0.024 0.980540 Mutant.IDZQ 50.511723.1632 2.181 0.029824 * Mutant.IDZR 217.682423.1632 9.398 2e-16 *** Mutant.IDZS 237.322723.1632 10.246 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 61.79 on 377 degrees of freedom Multiple R-Squared: 0.7351, Adjusted R-squared: 0.7049 F-statistic: 24.33 on 43 and 377 DF, p-value: 2.2e-16 My question relates to the meaning of the p-values. Do the p-values relate to a) the confidence in the estimate or b)the confidence that the non-intercept categories are different to the intercept Somebody mentioned to me that the p-value for the intercept is the confidence in the estimate of the intercept, whereas the remaining entries are the confidence in each strain being different from the reference / intercept Note the contrasts setting is contr.treatment. Any help would be appreciated Andrew McDonagh, PhD Candidate, Department of Infectious Diseases, Commonwealth Building, Hammersmith Hospital, Du Cane Road, London W12 ONN [EMAIL PROTECTED]
[R] recommendation for post-hoc tests after anova
Greetings all, I've done some ANOVAs and have found significant effects across my groups, so now I have to do some post-hoc tests to ascertain which of the groups is driving the significant effect. Usually one would do something like a Newman-Keuls or Scheffe test to get at this but based on googling and browsing through the r-help archives R doesn't support these and they seem to be badly received by the R community generally (for reasons related to alpha not being properly controlled, if I understand correctly). I found the TukeyHSD function but on reading the help page, I'm unsure whether it's safe for me to use it since the number of subjects in my groups is not balanced (ctrl=6, short=9, long=9). Would you reckon that this is mildly unbalanced in the spirit of the help page or that is my caution warranted? In the absence of NK or Scheefe tests could someone recommend an appropriate test for me to conduct in this instance? From what I've read in Explaining Psychological Statistics, the REGWQ test seems to be the way to go. Can R perform this test? Thanks for your time, Regards, -- Dr Colm G. Connolly School of Psychology and Institute of Neuroscience The Lloyd Building University of Dublin Trinity College, Dublin 2, Éire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] calculating similarity/distance among hierarchically classified items
This is a question about how to calculate similarities/distances among items that are classified by hierarchical attributes for the purpose of visualizing the relations among items by means of clustering, MDS, self-organizing maps, and so forth. I have a set of ~260 items that have been classified using two sets of hierarchically-organized codes on the basis of form and content. The data looks like that below, where the last two variables (ITEMFORM and ITEMCONTENT) are each a ';' separated list of codes assigned to each item. The items are identified by the KEY variable. (Other fields are ignored here.) KEY,YEAR,WHERE,CONTENT,FORM,ITEMFORM,ITEMCONTENT 1782Fourcroy,1782,Eur,Hdem,Stats,F5;F5K;F5N;F5N1,C8;C82 1785Crome,1785,Eur,Pdem,Stats,F5;F5N;F5N1,C7 1786Playfair,1786,Eur,Hdem,Stats,F6;F68;F69;F61;F62,C3;C32;C321;C323 1787Chladni,1787,Eur,Other,Other,F5;F55;FH;FD;FD3,C9;C95 1794Buxton,1794,Eur,Other,Tech,F3;F31;F7;F72;F722,C9;C9A 1795Pouchet,1795,Eur,Math,Stats,F5;F5G;FG;FG7,C2 1796Watt,1796,Eur,Pdem,Tech,FGB,C7;C9;C9A 1798Senefelder,1798,Eur,Other,Tech,FB;F5,C9;C97 ... The codes are hierarchical in the sense that, e.g., C321 corresponds to the levels in a tree, Commerce (C3) Internal (C32) Labour (C321) F5G corresponds to Diagram (F5) Nomogram (F5G) so the number of characters in a code is the level in the tree. There are about 290 distinct codes, with varying frequency of use, from 1 ..~40, so the data could be rearranged to a 260x290 incidence matrix of items x codes. In computing similarities between items, all measures I know of for binary attribute data treat the attributes as nominal, and so ignore the hierarchical nature of the codes. To take that into account, the 0/1 values could be replaced by the tree level values (0=NA, 1..5) of the codes in each column. Then some measure of similarity could be computed based on the profiles for each pair of items. But I don't know what measure (Gower's, euclidean, etc.) would be (most, or arguably) appropriate here. Is this a situation that anyone recognizes? Or, maybe there is another way to approach this. I'd appreciate any suggestions. -- Michael Friendly Email: [EMAIL PROTECTED] Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)
There is a **Huge** literature on robust regression, including many books that you can search on at e.g. Amazon. I think it fair to say that we have known since at least the 1970's that practically any robust downweighting procedure (see, e.g M-estimation) is preferable (more efficient, better continuity properties, better estimates) to trimming outliers defined by arbitrary threshholds. An excellent but now probably dated introductory discussion can be found in UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS edited by Hoaglin, Tukey, Mosteller, et. al. The rub in all this is that nice small sample inference results go our the window, though bootstrapping can help with this. Nevertheless, for a variety of reasons, my recommendation is simply to **never** use lm and **always** use rlm (with maybe a few minor caveats). Many would disagree with this, however. I don't think normalizing data as it's conventionally used has anything to do with robust regression, btw. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of r user Sent: Thursday, April 06, 2006 8:51 AM To: rhelp Subject: [R] pros and cons of robust regression? (i.e. rlm vs lm) Can anyone comment or point me to a discussion of the pros and cons of robust regressions, vs. a more manual approach to trimming outliers and/or normalizing data used in regression analysis? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R2WinBUGS error Solved, THANK YOU!
Thanks very much Uwe! There was a problem reading my data which I would not have discovered without careful examination of the WinBUGS log file. The program is working now! Take care, Joe -Original Message- From: Uwe Ligges [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 1:30 AM To: Joseph Retzer Cc: r-help@stat.math.ethz.ch Subject: Re: [R] R2WinBUGS error Joseph Retzer wrote: Dear R-help, I'm using the R2WinBUGS package and getting an error message: Error in file(file, r) : unable to open connection In addition: Warning message: cannot open file 'codaIndex.txt', reason 'No such file or directory' Looks like there was an error in WinBUGS: R cannot access the file, because WinBUGS has not written it before. You can use bugs(.., debug = TRUE) Then WinBUGS stays open and you can look at its error messages in order to correct the code/data/inits or whatever Uwe Ligges I'm using R 2.2.1 and WinBUGS 1.4.1 on a windows machine (XP). My R code and WinBUGS code is given below. The complete WinBUGS program executes correctly in WinBUGS however I'm generating some of my inits using WinBUGS so I may be making an error there. On the other hand, the error generated in R seems to imply it cannot locate a file. I've checked my paths and they are correct. Also, my data is loading correctly. Many thanks, Joe # # R code # Runs Bayesian Ordered Logit by calling WinBUGS from R # ologit2.txt: WinBUGS commands library(R2WinBUGS) setwd(c:/docume~1/admini~1/mydocu~1/r_tuto~1) load(oldat.Rdata) # R data file containing data frame ol.dat # with vars: q02, bf23f, bf23b, bf22, bf34a, bf34.1, bf34.2 q02 -ol.dat$q02 bf23f - ol.dat$bf23f bf23b - ol.dat$bf23b bf22 - ol.dat$bf22 bf34a - ol.dat$bf34a bf34.1 - ol.dat$bf34.1 bf34.2 - ol.dat$bf34.2 N=nrow(ol.dat) Ncut=5 data - list(N, q02, bf23f, bf23b, bf22, bf34a, bf34.1, bf34.2, Ncut) inits - function() { list(k=c(-5, -4, -3, -2, -1), tau=2, theta=rnorm(7, -1, 100)) } parameters - c(k) olog.out - bugs(data, inits, parameters, model.file=c:/Documents and Settings/Administrator/My Documents/r_tutorial/ologit2.txt, n.chains = 2, n.iter = 1000, bugs.directory = c:/Program Files/WinBUGS14/) # WinBUGS code model exec; { # Priors on regression coefficients theta[1] ~ dnorm( -1,1.0) ; theta[2] ~ dnorm(-1,1.0) theta[3] ~ dnorm( 1,1.0) ; theta[4] ~ dnorm(-1,1.0) theta[5] ~ dnorm( -1,1.0) ; theta[6] ~ dnorm( 1,1.0) theta[7] ~ dnorm( -1,1.0) # Priors on latent variable cutpoints k[1] ~ dnorm(0, 0.1)I(, k[2]); k[2] ~ dnorm(0, 0.1)I(k[1], k[3]) k[3] ~ dnorm(0, 0.1)I(k[2], k[4]); k[4] ~ dnorm(0, 0.1)I(k[3], k[5]) k[5] ~ dnorm(0, 0.1)I(k[4], ) # Prior on precision tau ~ dgamma(0.001, 0.001) # Some defs sigma - sqrt(1 / tau); log.sigma - log(sigma); for (i in 1 : N) { # Prior on b[i] ~ dnorm(0.0, tau) # Model Mean mu[i] - theta[1] + theta[2]*bf22[i] + theta[3]*bf23b[i] + theta[4]*bf23f[i] + theta[5]*bf34a[i] + theta[6]*bf34.1[i] + theta[7]*bf34.2[i] for (j in 1 : Ncut) { # Logit Model # cumulative probability of lower response than j logit(Q[i, j]) - -(k[j] + mu[i] + b[i]) } # probability that response = j p[i,1] - max( min(1 - Q[i,1], 1), 0) for (j in 2 : Ncut) { p[i,j] - max( min(Q[i,j-1] - Q[i,j],1), 0) } p[i,(Ncut+1)] - max( min(Q[i,Ncut], 1), 0) q02[i] ~ dcat(p[i, ]) } } [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Reshaping genetic data from long to wide
Bottom Line Up Front: How does one reshape genetic data from long to wide? I currently have a lot of data. About 180 individuals (some probands/patients, some parents, rare siblings) and SNP data from 6000 loci on each. The standard formats seem to be something along the lines of Famid, pid, fatid, motid, affected, sex, locus1Allele1, locus1Allele2, locus2Allele1, locus2Allele2, etc In other words one human, one row. If there were multiple loci then the variables would continue to be heaped up on the right. This kind of orientation, shall be referred to as wide. Given how big my dataset is, it is easier to manage the data in the database in the long format. In this format I have a pedigree table and from it, a one to many relationship with the SNP data. The SNP table has fields: uniqueHumanID, Allele1, Allele2, locus That makes for an incredibly long table. Data is stored in a Sybase database that I communicate with through ODBC using Microsoft Access. RODBC package then reads the queries that I have created in Microsoft Access. The only reason for Microsoft Access is that I have had well over a decade's worth of experience using it at an intermediate level. With the magic of SQL I can mix and match these tables. But creating the table that is 180 rows long and about 12010 variables wide is daunting. Essentially the 6000 SNPs represent each human having 12000 repeated measures (6000SNPs times 2 alleles) I presume I would be able to use the reshape function in R: Reshape Grouped Data Description This function reshapes a data frame between 'wide' format with repeated measurements in separate columns of the same record and 'long' format with the repeated measurements in separate records. BUT BEFORE I launch into this. Is there a way that either the Warnes package (Genetics) or the David Clayton package can handle the data in the long form? If not do any of the packages reshape the data in a way that is pedigree and genotype aware. The general R reshape function is not predesigned to be friendly to genetic data. Farrel Buchinsky, MD --- Mobile (412) 779-1073 Pediatric Otolaryngologist Allegheny General Hospital Pittsburgh, PA ** This email and any files transmitted with it are confidentia...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] skipping rows in trellis key
On 4/5/06, Steven Lacey [EMAIL PROTECTED] wrote: Try this... xyplot(y~x,data=data.frame(x=1:10,y=1:10)) keyArgs - list() keyArgs - list(points=list(pch=c(NA,rep(17,5)),lwd=2,col=c(NA,c(red, chartreuse3, black, cyan, blue))), text=list(lab=c(S-R Mapping, Color,Shape,Letter,Compatible,Incompatible),cex=c(1.2,1,1,1,1,1)), text=list(lab=c(expression(R^2),as.character(rep(0.999,5))),cex=c(1.2,1,1,1, 1,1))) keyArgs$between - c(1,0,5) keyArgs$columns - 1 keyArgs$column.between - 1 keyArgs$border - TRUE drawKeyArgs - list(key=keyArgs,draw=FALSE) keyArgs$draw - FALSE key - do.call(draw.key,drawKeyArgs) vp1-viewport(x=1,y=1,height=unit(1,grobheight,key),width=unit(1,grobwidt h,key),just=c(right,top)) pushViewport(vp1) grid.draw(key) popViewport() Hmm. I'll add a component called 'padding.text' in R 2.3.0 which can be used to control the padding around text rows. Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] key position in trellis plotting area
On 4/6/06, Steven Lacey [EMAIL PROTECTED] wrote: Hi, I want to do the following: 1) create a trellis plot with 1 x 1 layout 2) add a key in the upper right hand corner of the plotting region (i.e., the panel), but after the initial call to trellis 3) resize the resulting device graphics device without changing the relative position of the key For instance, the code below draws the key relative to the device window--not the plotting area. When I first wrote draw.key, I thought about which choice made more sense, and decided on the device window (or more accurately the total graphing area) because it is slightly more general. I eventually decided that there should be an option to support both, but I haven't gotten around to implementing that yet. It won't happen in time for R 2.3.0, but perhaps in some later update. Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Uneven y-axis scale
yeah, I try this method. But it is not fine to some data. On 4/6/06, Clint Bowman [EMAIL PROTECTED] wrote: Have you thought about using a log scale? Clint BowmanINTERNET: [EMAIL PROTECTED] Air Quality Modeler INTERNET: [EMAIL PROTECTED] Department of Ecology VOICE: (360) 407-6815 PO Box 47600FAX:(360) 407-7534 Olympia, WA 98504-7600 On Thu, 6 Apr 2006, Jim Lemon wrote: Kåre Edvardsen wrote: Dear R-gurus! Is it possible within boxplot to break the y-scale into two (or anything)? I'd like to have a normal linear y-range from 0-10 and the next tick mark starting at, say 60 and continue to 90. The reason is for better visualising the outliers. Hi Kare, In the plotrix package, there are two functions, gap.plot amd gap.barplot, that do something like this. If you think that they are sufficiently close to what you want, I could have a try at doing a gap.boxplot function. Jim __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)
To add to Bert's comments: - Normalizing data (e.g., subtracting mean and dividing by SD) can help numerical stability of the computation, but that's mostly unnecessary with modern hardware. As Bert said, that has nothing to do with robustness. - Instead of _replacing_ lm() with rlm() or other robust procedure, I'd do both of them. Some scientists view robust procedures that omit some data points (e.g., by assigning basically 0 weight to them) in automatic fashion and just trust the result as bad science, and I think they have a point. Use of robust procedure does not free one from examining the data carefully and looking at diagnostics. Careful treatment of outliers is esspecially important, I think, for data coming from a confirmatory experiment. If the conclusion you draw depends on downweighting or omitting certain data points, you ought to have very good reason for doing so. I think it can not be over-emphasized how important it is not to take outlier deletion lightly. I've seen many cases that what seems like outlier originally turned out to be legitimate data, and omission of them just lead to overly optimistic assessment of variability. Andy From: Berton Gunter There is a **Huge** literature on robust regression, including many books that you can search on at e.g. Amazon. I think it fair to say that we have known since at least the 1970's that practically any robust downweighting procedure (see, e.g M-estimation) is preferable (more efficient, better continuity properties, better estimates) to trimming outliers defined by arbitrary threshholds. An excellent but now probably dated introductory discussion can be found in UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS edited by Hoaglin, Tukey, Mosteller, et. al. The rub in all this is that nice small sample inference results go our the window, though bootstrapping can help with this. Nevertheless, for a variety of reasons, my recommendation is simply to **never** use lm and **always** use rlm (with maybe a few minor caveats). Many would disagree with this, however. I don't think normalizing data as it's conventionally used has anything to do with robust regression, btw. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of r user Sent: Thursday, April 06, 2006 8:51 AM To: rhelp Subject: [R] pros and cons of robust regression? (i.e. rlm vs lm) Can anyone comment or point me to a discussion of the pros and cons of robust regressions, vs. a more manual approach to trimming outliers and/or normalizing data used in regression analysis? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)
Thanks, Andy. Well said. Excellent points. The final weights from rlm serve this diagnostic purpose, of course. -- Bert -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 9:56 AM To: 'Berton Gunter'; 'r user'; 'rhelp' Subject: RE: [R] pros and cons of robust regression? (i.e. rlm vs lm) To add to Bert's comments: - Normalizing data (e.g., subtracting mean and dividing by SD) can help numerical stability of the computation, but that's mostly unnecessary with modern hardware. As Bert said, that has nothing to do with robustness. - Instead of _replacing_ lm() with rlm() or other robust procedure, I'd do both of them. Some scientists view robust procedures that omit some data points (e.g., by assigning basically 0 weight to them) in automatic fashion and just trust the result as bad science, and I think they have a point. Use of robust procedure does not free one from examining the data carefully and looking at diagnostics. Careful treatment of outliers is esspecially important, I think, for data coming from a confirmatory experiment. If the conclusion you draw depends on downweighting or omitting certain data points, you ought to have very good reason for doing so. I think it can not be over-emphasized how important it is not to take outlier deletion lightly. I've seen many cases that what seems like outlier originally turned out to be legitimate data, and omission of them just lead to overly optimistic assessment of variability. Andy From: Berton Gunter There is a **Huge** literature on robust regression, including many books that you can search on at e.g. Amazon. I think it fair to say that we have known since at least the 1970's that practically any robust downweighting procedure (see, e.g M-estimation) is preferable (more efficient, better continuity properties, better estimates) to trimming outliers defined by arbitrary threshholds. An excellent but now probably dated introductory discussion can be found in UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS edited by Hoaglin, Tukey, Mosteller, et. al. The rub in all this is that nice small sample inference results go our the window, though bootstrapping can help with this. Nevertheless, for a variety of reasons, my recommendation is simply to **never** use lm and **always** use rlm (with maybe a few minor caveats). Many would disagree with this, however. I don't think normalizing data as it's conventionally used has anything to do with robust regression, btw. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of r user Sent: Thursday, April 06, 2006 8:51 AM To: rhelp Subject: [R] pros and cons of robust regression? (i.e. rlm vs lm) Can anyone comment or point me to a discussion of the pros and cons of robust regressions, vs. a more manual approach to trimming outliers and/or normalizing data used in regression analysis? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R CMD check for packages in a bundle
On Thu, 6 Apr 2006, Paul Gilbert wrote: I think you have to remove the DESCRIPTION.in file when you check the package stand alone. As I recall, if it exists that is the flag to indicate a bundle. Also beware that the DESCRIPTION file in the stand alone package is not exactly the same as the DESCRIPTION.in file when it is part of the bundle. What you are trying to do does work, I do it all the time, but there are a few details to work out. Because of the above differences I actually generate the DESCRIPTION* files from my Makefile, differently depending on the target. I do too. Having a DESCRIPTION.in is irrelevant: having a Contains: line in DESCRIPTION is the test used. Can we please read the posting guide and not use R-help for 'questions and discussion about code development in R' which is the description of the R-devel list. Paul Gilbert Robin Hankin wrote: Hi [MacOsX 10.4.6; R-2.2.1] I have a bundle that comprises three packages. I want to run R CMD check on each one individually, as it takes a long time to run on all three. I am having problems. The bundle as a whole passes R CMD check, but fails when I cd to the bundle directory and run R CMD check on a package directory. The whole bundle passes: octopus:~/scratch% R CMD check ./BACCO * checking for working latex ... OK * using log directory '/Users/rksh/scratch/BACCO.Rcheck' * using R version 2.2.1, 2005-12-20 * checking for file 'BACCO/DESCRIPTION' ... OK * looks like 'BACCO' is a package bundle * this is bundle 'BACCO' version '1.0-29' * checking if this is a source bundle ... OK [snip] ** creating approximator-manual.tex ... OK ** checking approximator-manual.tex ... OK octopus:~/scratch% but R CMD check fails on the packages individually: octopus:~/scratch/BACCO% R CMD check ./calibrator * checking for working latex ... OK * using log directory '/Users/rksh/scratch/BACCO/calibrator.Rcheck' * using R version 2.2.1, 2005-12-20 * checking for file 'calibrator/DESCRIPTION' ... OK * looks like 'calibrator' is a package bundle * this is bundle 'BACCO' version '1.0-29' * checking if this is a source bundle ... OK /Library/Frameworks/R.framework/Resources/bin/INSTALL: line 1: cd: / Users/rksh/scratch/BACCO/calibrator/emulator: No such file or directory /Library/Frameworks/R.framework/Resources/bin/INSTALL: line 1: cd: / Users/rksh/scratch/BACCO/calibrator/calibrator: No such file or directory /Library/Frameworks/R.framework/Resources/bin/INSTALL: line 1: cd: / Users/rksh/scratch/BACCO/calibrator/approximator: No such file or directory ERROR: no 'Package' field in 'DESCRIPTION' ERROR Installation failed. octopus:~/scratch/BACCO% Both DESCRIPTION and DESCRIPTION.in files seem to be OK: octopus:~/scratch/BACCO/calibrator% cat DESCRIPTION Package: calibrator Description: Performs Bayesian calibration of computer models as per [snip] Title: Bayesian calibration of computer models Bundle: BACCO Version: 1.0-29 Date: 13 October 2005 Depends: R (= 2.0.0), mvtnorm, adapt Contains: emulator calibrator approximator Title: Bayesian Analysis of Computer Code Output BundleDescription: Bayesian analysis of computer code software. The bundle contains routines that evaluate the formulae of Kennedy, O'Hagan, and Oakley. License: GPL Author: Robin K. S. Hankin [EMAIL PROTECTED] Maintainer: Robin K. S. Hankin [EMAIL PROTECTED] URL: http://www.noc.soton.ac.uk/JRD/SAT/pers/rksh.html octopus:~/scratch/BACCO/calibrator% cat DESCRIPTION.in Package: calibrator Description: Performs Bayesian calibration of computer models as per [snip] Title: Bayesian calibration of computer models -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html La version française suit le texte anglais. This email may contain privileged and/or confidential inform...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Berton Gunter Sent: 06 April 2006 14:22 To: 'r user'; 'rhelp' Subject: Re: [R] pros and cons of robust regression? (i.e. rlm vs lm) There is a **Huge** literature on robust regression, including many books that you can search on at e.g. Amazon. I think it fair to say that we have known since at least the 1970's that practically any robust downweighting procedure (see, e.g M-estimation) is preferable (more efficient, better continuity properties, better estimates) to trimming outliers defined by arbitrary threshholds. An excellent but now probably dated introductory discussion can be found in UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS edited by Hoaglin, Tukey, Mosteller, et. al. In the mixture-of-distributions approach of ADMB's robust_regression(x,y,a) command, there is no need to abandon the likelihood function for a more general function. The outliers are assumed to come from another, contaminating distribution, with extra parameter a, and then a proper, more complete, likelihood function is used. Also it seems that the mixture-of-distributions approach is more interpretable, more related to physical mechanisms generating departures from the distributional assumptions. In a paper on nonlinear models for the growth of certain marine animals where I used ADMB robust regression, I argued that the outliers were produced by human errors in the reading of age in certain hard structures in the body of the animals. This was consistent with the structure of the likelihood which consisted of the mixture of a normal and another contaminating distribution with fatter tails, operating mostly at higher values of the predictor variable (age). Ruben __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] recommendation for post-hoc tests after anova
You might want to take a look at the multcomp package, which does it in more modern fashion. The tukey or dunnett options there refer to the types of comparisons (Tukey means all pairwise, Dunnett means all vs. control) rather than the named procedures. The package has a vignette that ought to be helpful to you. Andy From: Colm Connolly Greetings all, I've done some ANOVAs and have found significant effects across my groups, so now I have to do some post-hoc tests to ascertain which of the groups is driving the significant effect. Usually one would do something like a Newman-Keuls or Scheffe test to get at this but based on googling and browsing through the r-help archives R doesn't support these and they seem to be badly received by the R community generally (for reasons related to alpha not being properly controlled, if I understand correctly). I found the TukeyHSD function but on reading the help page, I'm unsure whether it's safe for me to use it since the number of subjects in my groups is not balanced (ctrl=6, short=9, long=9). Would you reckon that this is mildly unbalanced in the spirit of the help page or that is my caution warranted? In the absence of NK or Scheefe tests could someone recommend an appropriate test for me to conduct in this instance? From what I've read in Explaining Psychological Statistics, the REGWQ test seems to be the way to go. Can R perform this test? Thanks for your time, Regards, -- Dr Colm G. Connolly School of Psychology and Institute of Neuroscience The Lloyd Building University of Dublin Trinity College, Dublin 2, Éire __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)
I'm asking this question purely for my own benefit, not to try to correct anyone. The procedure you refer to as normalization I have always heard referred to as standardization. Is the former the proper term? Also, you say its not necessary given today's hardware, but isn't it beneficial to get all the variables in a similar range? Is thre any other transformation that you would suggest? I use rlm (and normalization) in my models I use every day, so I was happy to read the above comments. Thanks, Roger On 4/6/06, Berton Gunter [EMAIL PROTECTED] wrote: Thanks, Andy. Well said. Excellent points. The final weights from rlm serve this diagnostic purpose, of course. -- Bert -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 9:56 AM To: 'Berton Gunter'; 'r user'; 'rhelp' Subject: RE: [R] pros and cons of robust regression? (i.e. rlm vs lm) To add to Bert's comments: - Normalizing data (e.g., subtracting mean and dividing by SD) can help numerical stability of the computation, but that's mostly unnecessary with modern hardware. As Bert said, that has nothing to do with robustness. - Instead of _replacing_ lm() with rlm() or other robust procedure, I'd do both of them. Some scientists view robust procedures that omit some data points (e.g., by assigning basically 0 weight to them) in automatic fashion and just trust the result as bad science, and I think they have a point. Use of robust procedure does not free one from examining the data carefully and looking at diagnostics. Careful treatment of outliers is esspecially important, I think, for data coming from a confirmatory experiment. If the conclusion you draw depends on downweighting or omitting certain data points, you ought to have very good reason for doing so. I think it can not be over-emphasized how important it is not to take outlier deletion lightly. I've seen many cases that what seems like outlier originally turned out to be legitimate data, and omission of them just lead to overly optimistic assessment of variability. Andy From: Berton Gunter There is a **Huge** literature on robust regression, including many books that you can search on at e.g. Amazon. I think it fair to say that we have known since at least the 1970's that practically any robust downweighting procedure (see, e.g M-estimation) is preferable (more efficient, better continuity properties, better estimates) to trimming outliers defined by arbitrary threshholds. An excellent but now probably dated introductory discussion can be found in UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS edited by Hoaglin, Tukey, Mosteller, et. al. The rub in all this is that nice small sample inference results go our the window, though bootstrapping can help with this. Nevertheless, for a variety of reasons, my recommendation is simply to **never** use lm and **always** use rlm (with maybe a few minor caveats). Many would disagree with this, however. I don't think normalizing data as it's conventionally used has anything to do with robust regression, btw. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of r user Sent: Thursday, April 06, 2006 8:51 AM To: rhelp Subject: [R] pros and cons of robust regression? (i.e. rlm vs lm) Can anyone comment or point me to a discussion of the pros and cons of robust regressions, vs. a more manual approach to trimming outliers and/or normalizing data used in regression analysis? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this
Re: [R] skipping rows in trellis key
Deepayan, I also noticed that the width of the key object is off if the title of the key is bigger than the columns. For example, xyplot(y~x,data=data.frame(x=1:10,y=1:10)) keyArgs - list() keyArgs - list(points=list(pch=c(NA,rep(17,5)),lwd=2,col=c(NA,c(red, chartreuse3, black, cyan, blue))), text=list(lab=c(S-R Mapping, Color,Shape,Letter,Compatible,Incompatible),cex=c(1.2,1,1,1,1,1)), text=list(lab=c(expression(R^2),as.character(rep(0.999,5))),cex=c(1.2,1,1,1, 1,1))) keyArgs$title - try this very, very, very long title keyArgs$cex.title - 1.4 keyArgs$between - c(1,0,5) keyArgs$columns - 1 keyArgs$column.between - 1 keyArgs$border - TRUE drawKeyArgs - list(key=keyArgs,draw=FALSE) keyArgs$draw - FALSE key - do.call(draw.key,drawKeyArgs) vp1-viewport(x=1,y=1,height=unit(1,grobheight,key),width=unit(1,grobwidt h,key),just=c(right,top)) pushViewport(vp1) grid.draw(key) popViewport() What is a good work around? Thanks, Steve -Original Message- From: Deepayan Sarkar [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 12:32 PM To: Steven Lacey Cc: r-help@stat.math.ethz.ch Subject: Re: skipping rows in trellis key On 4/5/06, Steven Lacey [EMAIL PROTECTED] wrote: Try this... xyplot(y~x,data=data.frame(x=1:10,y=1:10)) keyArgs - list() keyArgs - list(points=list(pch=c(NA,rep(17,5)),lwd=2,col=c(NA,c(red, chartreuse3, black, cyan, blue))), text=list(lab=c(S-R Mapping, Color,Shape,Letter,Compatible,Incompatible),cex=c(1.2,1,1,1,1,1)), text=list(lab=c(expression(R^2),as.character(rep(0.999,5))),cex=c(1.2, 1,1,1, 1,1))) keyArgs$between - c(1,0,5) keyArgs$columns - 1 keyArgs$column.between - 1 keyArgs$border - TRUE drawKeyArgs - list(key=keyArgs,draw=FALSE) keyArgs$draw - FALSE key - do.call(draw.key,drawKeyArgs) vp1-viewport(x=1,y=1,height=unit(1,grobheight,key),width=unit(1,grobwidt h,key),just=c(right,top)) pushViewport(vp1) grid.draw(key) popViewport() Hmm. I'll add a component called 'padding.text' in R 2.3.0 which can be used to control the padding around text rows. Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)
A great example of the hazards of automatic outlier rejection is the story of how the hole in the ozone layer in the southern hemisphere was discovered. Outliers were dutifully entered into the data base but discounted as probable metrology problems, which also plagued the investigation. As the percentage of outliers became excessive, investigators untimately became convinced that many of the outliers were not metrology problems but real physical problems. For a recent discussion of this, see Maureen Christie (2004) 11. Data Collection and the Ozone Hole: Too much of a good thing? Proceedigns of the International Commission on History of Meteorology 1.1, pp. 99-105 (www.meteohistory.org/2004proceedings1.1/pdfs/11christie.pdf). spencer graves p.s. I understand that Australia now has one of the world's highest rates of skin cancer, which has contributed to a major change in outdoor styles of dress there. Berton Gunter wrote: Thanks, Andy. Well said. Excellent points. The final weights from rlm serve this diagnostic purpose, of course. -- Bert -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 9:56 AM To: 'Berton Gunter'; 'r user'; 'rhelp' Subject: RE: [R] pros and cons of robust regression? (i.e. rlm vs lm) To add to Bert's comments: - Normalizing data (e.g., subtracting mean and dividing by SD) can help numerical stability of the computation, but that's mostly unnecessary with modern hardware. As Bert said, that has nothing to do with robustness. - Instead of _replacing_ lm() with rlm() or other robust procedure, I'd do both of them. Some scientists view robust procedures that omit some data points (e.g., by assigning basically 0 weight to them) in automatic fashion and just trust the result as bad science, and I think they have a point. Use of robust procedure does not free one from examining the data carefully and looking at diagnostics. Careful treatment of outliers is esspecially important, I think, for data coming from a confirmatory experiment. If the conclusion you draw depends on downweighting or omitting certain data points, you ought to have very good reason for doing so. I think it can not be over-emphasized how important it is not to take outlier deletion lightly. I've seen many cases that what seems like outlier originally turned out to be legitimate data, and omission of them just lead to overly optimistic assessment of variability. Andy From: Berton Gunter There is a **Huge** literature on robust regression, including many books that you can search on at e.g. Amazon. I think it fair to say that we have known since at least the 1970's that practically any robust downweighting procedure (see, e.g M-estimation) is preferable (more efficient, better continuity properties, better estimates) to trimming outliers defined by arbitrary threshholds. An excellent but now probably dated introductory discussion can be found in UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS edited by Hoaglin, Tukey, Mosteller, et. al. The rub in all this is that nice small sample inference results go our the window, though bootstrapping can help with this. Nevertheless, for a variety of reasons, my recommendation is simply to **never** use lm and **always** use rlm (with maybe a few minor caveats). Many would disagree with this, however. I don't think normalizing data as it's conventionally used has anything to do with robust regression, btw. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of r user Sent: Thursday, April 06, 2006 8:51 AM To: rhelp Subject: [R] pros and cons of robust regression? (i.e. rlm vs lm) Can anyone comment or point me to a discussion of the pros and cons of robust regressions, vs. a more manual approach to trimming outliers and/or normalizing data used in regression analysis? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary
Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)
Spencer: Your comment reinforces Andy's point, which is that purported outliers must not be ignored but need to be clearly identified and examined. For reasons that you well understand, robust regression methods are better for this in the linear models context than standard least aquares. However, as I understand it, the problem in the case that you describe is **not** that the outliers weren't identified and examined, but that they were and were dismissed as metrological anomalies. I would say this was an error of scientific (mis)judgment, not data analystical methodology. Anyway, this has taken us too far afield, so no more on-list comments from me. -- Bert -Original Message- From: Spencer Graves [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 10:30 AM To: Berton Gunter Cc: 'Liaw, Andy'; 'r user'; 'rhelp' Subject: Re: [R] pros and cons of robust regression? (i.e. rlm vs lm) A great example of the hazards of automatic outlier rejection is the story of how the hole in the ozone layer in the southern hemisphere was discovered. Outliers were dutifully entered into the data base but discounted as probable metrology problems, which also plagued the investigation. As the percentage of outliers became excessive, investigators untimately became convinced that many of the outliers were not metrology problems but real physical problems. For a recent discussion of this, see Maureen Christie (2004) 11. Data Collection and the Ozone Hole: Too much of a good thing? Proceedigns of the International Commission on History of Meteorology 1.1, pp. 99-105 (www.meteohistory.org/2004proceedings1.1/pdfs/11christie.pdf). spencer graves p.s. I understand that Australia now has one of the world's highest rates of skin cancer, which has contributed to a major change in outdoor styles of dress there. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] difference between two density plots
Hello, I have two densities which I plot with: sm.density(gps) sm.density(gps2) Both data sets are 2D and have the same scale. Is there a way of plotting the difference between the two density plots ? Thank you very much, Phil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] skipping rows in trellis key
On 4/6/06, Steven Lacey [EMAIL PROTECTED] wrote: Deepayan, I also noticed that the width of the key object is off if the title of the key is bigger than the columns. For example, xyplot(y~x,data=data.frame(x=1:10,y=1:10)) keyArgs - list() keyArgs - list(points=list(pch=c(NA,rep(17,5)),lwd=2,col=c(NA,c(red, chartreuse3, black, cyan, blue))), text=list(lab=c(S-R Mapping, Color,Shape,Letter,Compatible,Incompatible),cex=c(1.2,1,1,1,1,1)), text=list(lab=c(expression(R^2),as.character(rep(0.999,5))),cex=c(1.2,1,1,1, 1,1))) keyArgs$title - try this very, very, very long title keyArgs$cex.title - 1.4 keyArgs$between - c(1,0,5) keyArgs$columns - 1 keyArgs$column.between - 1 keyArgs$border - TRUE drawKeyArgs - list(key=keyArgs,draw=FALSE) keyArgs$draw - FALSE key - do.call(draw.key,drawKeyArgs) vp1-viewport(x=1,y=1,height=unit(1,grobheight,key),width=unit(1,grobwidt h,key),just=c(right,top)) pushViewport(vp1) grid.draw(key) popViewport() What is a good work around? None I can think of (and it's not easy to fix given the current implementation). Bad workarounds are: (1) use space=top and border=FALSE (2) Shorten the title by embedding newlines or shrinking the text size (3) add dummy columns (say rectangles) with transparent colors Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Creating tables with std errors beneath estimates
I have recently become familiar with the latex() function thanks to the help of many people on this list. However, now I need to figure out how to make the right dataframe for the tables I want to make. What I need to do is make tables with the std. errors beneath the estimates in parentheses (standard econ style). How can I make a dataframe with that format? Thanks, Brian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] difference between two density plots
On 4/6/2006 1:34 PM, Philipp H. Mohr wrote: Hello, I have two densities which I plot with: sm.density(gps) sm.density(gps2) Both data sets are 2D and have the same scale. Is there a way of plotting the difference between the two density plots ? ?sm.density describes the value returned by those function calls, which includes the values of the density estimate at the evaluation points. The evaluation points may be different between the two datasets, but you could use approxfun to generate functions doing linear interpolation for each of them, and then evaluate the density difference whereever you like. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)
Some people use the two terms sort of interchangeably. Beneficial in what sense? If there are no polynomial terms involving a variable, any linear transformation of a variable (by itself) does not change the quality of the fit at all. The scaling gets reflected in the coefficient and its SE, but t-statistics and p-value stay the same, as well as predictions, R-squared, etc. Even when there's polynomial term, R-squared and predictions do not change. HTH, Andy -Original Message- From: roger bos [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 1:15 PM To: Berton Gunter; Liaw, Andy Cc: rhelp Subject: Re: [R] pros and cons of robust regression? (i.e. rlm vs lm) I'm asking this question purely for my own benefit, not to try to correct anyone. The procedure you refer to as normalization I have always heard referred to as standardization. Is the former the proper term? Also, you say its not necessary given today's hardware, but isn't it beneficial to get all the variables in a similar range? Is thre any other transformation that you would suggest? I use rlm (and normalization) in my models I use every day, so I was happy to read the above comments. Thanks, Roger On 4/6/06, Berton Gunter [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Thanks, Andy. Well said. Excellent points. The final weights from rlm serve this diagnostic purpose, of course. -- Bert -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ] Sent: Thursday, April 06, 2006 9:56 AM To: 'Berton Gunter'; 'r user'; 'rhelp' Subject: RE: [R] pros and cons of robust regression? (i.e. rlm vs lm) To add to Bert's comments: - Normalizing data (e.g., subtracting mean and dividing by SD) can help numerical stability of the computation, but that's mostly unnecessary with modern hardware. As Bert said, that has nothing to do with robustness. - Instead of _replacing_ lm() with rlm() or other robust procedure, I'd do both of them. Some scientists view robust procedures that omit some data points (e.g., by assigning basically 0 weight to them) in automatic fashion and just trust the result as bad science, and I think they have a point. Use of robust procedure does not free one from examining the data carefully and looking at diagnostics. Careful treatment of outliers is esspecially important, I think, for data coming from a confirmatory experiment. If the conclusion you draw depends on downweighting or omitting certain data points, you ought to have very good reason for doing so. I think it can not be over-emphasized how important it is not to take outlier deletion lightly. I've seen many cases that what seems like outlier originally turned out to be legitimate data, and omission of them just lead to overly optimistic assessment of variability. Andy From: Berton Gunter There is a **Huge** literature on robust regression, including many books that you can search on at e.g. Amazon. I think it fair to say that we have known since at least the 1970's that practically any robust downweighting procedure (see, e.g M-estimation) is preferable (more efficient, better continuity properties, better estimates) to trimming outliers defined by arbitrary threshholds. An excellent but now probably dated introductory discussion can be found in UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS edited by Hoaglin, Tukey, Mosteller, et. al. The rub in all this is that nice small sample inference results go our the window, though bootstrapping can help with this. Nevertheless, for a variety of reasons, my recommendation is simply to **never** use lm and **always** use rlm (with maybe a few minor caveats). Many would disagree with this, however. I don't think normalizing data as it's conventionally used has anything to do with robust regression, btw. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ] On Behalf Of r user Sent: Thursday, April 06, 2006 8:51 AM To: rhelp Subject: [R] pros and cons of robust regression? (i.e. rlm vs lm) Can anyone comment or point me to a discussion of the pros and cons of robust regressions, vs. a more manual approach to trimming outliers and/or normalizing data used in regression analysis? __ R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html http://www.R-project.org/posting-guide.html
Re: [R] Mpirun with R CMD scripts
Your description was a bit hard for me to follow. I think what you're trying to do is something like TestRmpi.R -- library(Rmpi) if (0==mpi.comm.rank(comm=0)) { # 'manager', e.g., cat(manager\n) } else { # 'worker', e.g., cat(mpi.comm.rank(comm=0),worker\n) } mpi.quit() TestRmpi.sh --- R --slave $1 and from the command line mpirun -np 3 TestRmpi.sh TestRmpi.R If instead you try mpirun -np 3 R CMD BATCH TestRmpi.R then each node executes R CMD BATCH. As a consequence, each node writes to a file TestRmpi.Rout, and you end up with an undetermined outcome (because the different nodes are all trying to write to the same file). If your file were WRONGTestSnow.R --- library(snow) cl - makeCluster(3,type=MPI) clusterCall(cl,function() Sys.info()[c(nodename,machine)]) stopCluster(cl) then you would execute makeCluster on each node, for a total of 9 instances of R. Probably not what you want. I found that snow was useful for interactive and exploratory work, but that Rmpi (or rpvm) provide the level of control that you really need for more substantial use of parallel resources. Recent versions of Rmpi have mpi.parLapply and related, which are like the high-level functions in snow. Hope that helps, Martin Srividya Valivarthi [EMAIL PROTECTED] writes: Hi, I am working on a 64-bit rocks cluster and am relatively new to the R package. I am trying to get Snow working with R and Rmpi and have run into the following issue. R is able to load the Rmpi and snow libraries and is able to run simple commands both interactively and batch as follows: --- [EMAIL PROTECTED] ~]$ R R : Copyright 2005, The R Foundation for Statistical Computing Version 2.2.0 (2005-10-06 r35749) ISBN 3-900051-07-0 library(snow) c1-makeCluster(3, type=MPI) Loading required package: Rmpi 3 slaves are spawned successfully. 0 failed. clusterCall(c1,function() Sys.info()[c(nodename,machine)]) [[1]] nodename machine cheaha.ac.uab.edux86_64 [[2]] nodename machine compute-0-12.local x86_64 [[3]] nodename machine compute-0-13.local x86_64 stopCluster(c1) [1] 1 q() -- But on running the same script with mpirun i get the following error. *8 [EMAIL PROTECTED] ~]$ mpirun -np 3 R -slave R CMD BATCH TestSnow.R /home/srividya/R/library/snow/RMPInode.sh: line 9: 19431 Segmentation fault ${RPROG:-R} --vanilla ${OUT:-/dev/null} 21 EOF library(Rmpi) library(snow) runMPIslave() EOF ** I am not sure about what the error could be and any help is greatly appreaciated. Thanks, Srividya __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Finding out the format of an object
Suppose I have an arbitrary R object. Is there a way to find out its format? There are 118 points, each described by two numbers. Let the name of the object be obj without the quotes. I can do a print (obj), but all I get is a bunch of numbers. I can do a ls.str (obj), but all I get is a bunch of numbers. Is it a data frame? A vector with 118 elements, each having two sub-elements? Or ---? The object in question is of the class gam (Generalized Additive Model). I know how to pull out some data with the predict function, but, when I try to print the value of the predict function, all I get is a bunch of numbers. Also, how could I have looked this up without having to post a question here? Tom Thomas L. Jones, Ph.D., Computer Science __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Finding out the format of an object
On 4/6/2006 2:24 PM, Thomas L Jones wrote: Suppose I have an arbitrary R object. Is there a way to find out its format? There are 118 points, each described by two numbers. Let the name of the object be obj without the quotes. I can do a print (obj), but all I get is a bunch of numbers. I can do a ls.str (obj), but all I get is a bunch of numbers. Is it a data frame? A vector with 118 elements, each having two sub-elements? Or ---? The object in question is of the class gam (Generalized Additive Model). I know how to pull out some data with the predict function, but, when I try to print the value of the predict function, all I get is a bunch of numbers. str(obj) usually gives you the sort of thing you're looking for. Also, how could I have looked this up without having to post a question here? This is a tricky question. Usually RSiteSearch() will turn up useful information, but I suspect you'll get a lot of false hits if you look for format or structure. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Indexing With List Of Vectors (Replacement)
I have the following: a - matrix(1:10, nrow = 2, byrow = TRUE) b - array(as.integer(0), c(7, 5)) idx - list() length(idx) - 2 dim(idx) - c(1, 2) idx[[1]] - as.integer(1:2) idx[[2]] - as.integer(1:5) I can do the following, which works if 'b' is a matrix. b[idx[[1]], idx[[2]]] - a b [,1] [,2] [,3] [,4] [,5] [1,]13579 [2,]2468 10 [3,]88888 [4,]88888 [5,]88888 [6,]88888 [7,]88888 Looking for a way to do this generically such that 'idx' with 'n' length can be used to index n-dimensional arrays. b[translate 'idx' to array indices] - a -- SIGSIG -- signature too long (core dumped) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Finding out the format of an object
Hi Tom, If you assume that's a common question, and so likely to be in base, help.search(object, package=base) calls up a list of functions relating to objects. (Leaving off the package argument calls up a much longer one). Browsing thru the list gets you the intriguing line: class(base) Object Classes ?class gets you more information about that function, which sounds promising. class(myobject) will tell you what class(es) myobject belongs to. names(myobject) will show you the named subcomponents, which can be accessed with myobject$somename The object in question is of the class gam (Generalized Additive Model). I know how to pull out some data with the predict function, but, when I try to print the value of the predict function, all I get is a bunch of numbers. I'm not sure what you mean by value of the predict function, but you could try typing just predict.gam at the prompt to get some idea of what it's doing (although not necessarily the complete source). Sarah -- Sarah Goslee http://www.stringpage.com [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] algebra
hello all. i have been trying to develop a representation (in the S4 sense) for a floating cash object, which would store a cash amount as a function of an arbitrary number of variables (with unknown values). for example, an interest rate swap may call for a payment in one year that can be represented as a function of a 3-month libor rate to be determined in nine months. this floating cash amount would be represented as, say, 25*L (i am omitting some of the details leading to the scalar), where L is the unknown rate. it will be necessary for addition and multiplication operations (among others) to make sense for the class. i haven't found R to be particularly accommodating of this type of problem, in particular of the algebraic computations necessary to make addition and multiplication work. my best thought so far was to have one slot in the representation be an expression object that would evaluate to a numeric cash object, if the evaluation were to take place in an environment where all the variables were ultimately bound to numerics (in the example above, expression(25*L) would fill the slot). this approach leads me to the following questions: (i) does R have any native computer algebra functionality that would allow expression(25*L) + expression(25*M) to evaluate to [1] expression(25*(L+M)) (ii) if not, does anyone have experience with integrating available computer algebra platforms (such as Mathomatic, which Gabor discussed in an earlier thread, but maybe not Mathomatic, because of its lack of natural log support) with R, i.e, passing expressions out for algebraic computations and then returning the result? are there any packages available to provide APIs to any systems? Any open source systems? (iii) is there another way to handle all this other than algebraic manipulation of expressions? even though i can't complete the thought, i keep picturing a massive n-dimensional array (if n is the total number of unknowns i am dealing with) that would somehow store scalars associated with all the possible combinations of variables? does that thought have a future? (please ignore this last question if i am not, as is often the case, making sense) thanks in advance for any help. franklin parlamis __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] difference between two density plots
On Thu, 6 Apr 2006, Duncan Murdoch wrote: On 4/6/2006 1:34 PM, Philipp H. Mohr wrote: Hello, I have two densities which I plot with: sm.density(gps) sm.density(gps2) Both data sets are 2D and have the same scale. Is there a way of plotting the difference between the two density plots ? ?sm.density describes the value returned by those function calls, which includes the values of the density estimate at the evaluation points. The evaluation points may be different between the two datasets, but you could use approxfun to generate functions doing linear interpolation for each of them, and then evaluate the density difference whereever you like. The eval.points can be set using sm.options(), fixing them to be the same. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] JSS
There are some nice new R contributions in the latest volume of JSS http://www.jstatsoft.org The most recent one (and a good model for similar comparative work) is Authors: Alexandros Karatzoglou and David Meyer and Kurt Hornik Title: Support Vector Machines in R Reference: Volume 15, 2006, Issue 9 Acquire: paper [0] browse files [0] Dates: submitted: 2005-10-24accepted: 2006-04-06 If you want to be kept abreast of whatever is published, subscribe to jss-announce at http://lists.stat.ucla.edu/mailman/listinfo/jss-announce === Jan de Leeuw; Distinguished Professor and Chair, UCLA Department of Statistics; Editor: Journal of Multivariate Analysis, Journal of Statistical Software US mail: 8125 Math Sciences Bldg, Box 951554, Los Angeles, CA 90095-1554 phone (310)-825-9550; fax (310)-206-5658; email: [EMAIL PROTECTED] .mac: jdeleeuw ++ aim: deleeuwjan ++ skype: j_deleeuw homepages: http://gifi.stat.ucla.edu ++ http://www.cuddyvalley.org - No matter where you go, there you are. --- Buckaroo Banzai http://gifi.stat.ucla.edu/sounds/nomatter.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] pros and cons of robust regression? (i.e. rlm vs lm)
There are several kinds of standardization, and 'normalization' is only one of them. For some details you could check http://support.sas.com/91doc/getDoc/statug.hlp/stdize_index.htm (see Details for standardization methods). Standardization is required prior to clustering to control for the impact of scale. (Variables with large variances tend to have more effect on the resulting clusters than those with small variances.) I don't know how valuable standardization may be in other areas. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of roger bos Sent: Thursday, April 06, 2006 1:15 PM To: Berton Gunter; Liaw, Andy Cc: rhelp Subject: Re: [R] pros and cons of robust regression? (i.e. rlm vs lm) I'm asking this question purely for my own benefit, not to try to correct anyone. The procedure you refer to as normalization I have always heard referred to as standardization. Is the former the proper term? Also, you say its not necessary given today's hardware, but isn't it beneficial to get all the variables in a similar range? Is thre any other transformation that you would suggest? I use rlm (and normalization) in my models I use every day, so I was happy to read the above comments. Thanks, Roger On 4/6/06, Berton Gunter [EMAIL PROTECTED] wrote: Thanks, Andy. Well said. Excellent points. The final weights from rlm serve this diagnostic purpose, of course. -- Bert -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Thursday, April 06, 2006 9:56 AM To: 'Berton Gunter'; 'r user'; 'rhelp' Subject: RE: [R] pros and cons of robust regression? (i.e. rlm vs lm) To add to Bert's comments: - Normalizing data (e.g., subtracting mean and dividing by SD) can help numerical stability of the computation, but that's mostly unnecessary with modern hardware. As Bert said, that has nothing to do with robustness. - Instead of _replacing_ lm() with rlm() or other robust procedure, I'd do both of them. Some scientists view robust procedures that omit some data points (e.g., by assigning basically 0 weight to them) in automatic fashion and just trust the result as bad science, and I think they have a point. Use of robust procedure does not free one from examining the data carefully and looking at diagnostics. Careful treatment of outliers is esspecially important, I think, for data coming from a confirmatory experiment. If the conclusion you draw depends on downweighting or omitting certain data points, you ought to have very good reason for doing so. I think it can not be over-emphasized how important it is not to take outlier deletion lightly. I've seen many cases that what seems like outlier originally turned out to be legitimate data, and omission of them just lead to overly optimistic assessment of variability. Andy From: Berton Gunter There is a **Huge** literature on robust regression, including many books that you can search on at e.g. Amazon. I think it fair to say that we have known since at least the 1970's that practically any robust downweighting procedure (see, e.g M-estimation) is preferable (more efficient, better continuity properties, better estimates) to trimming outliers defined by arbitrary threshholds. An excellent but now probably dated introductory discussion can be found in UNDERSTANDING ROBUST AND EXPLORATORY DATA ANALYSIS edited by Hoaglin, Tukey, Mosteller, et. al. The rub in all this is that nice small sample inference results go our the window, though bootstrapping can help with this. Nevertheless, for a variety of reasons, my recommendation is simply to **never** use lm and **always** use rlm (with maybe a few minor caveats). Many would disagree with this, however. I don't think normalizing data as it's conventionally used has anything to do with robust regression, btw. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA The business of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of r user Sent: Thursday, April 06, 2006 8:51 AM To: rhelp Subject: [R] pros and cons of robust regression? (i.e. rlm vs lm) Can anyone comment or point me to a discussion of the pros and cons of robust regressions, vs. a more manual approach to trimming outliers and/or normalizing data used in regression analysis? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Evaluating a function with another function
Hello all, I hope someone can help me with this. I have function that calculates two values based on input data. A simple example follows: test-function(x,s,rangit=seq(0,10,1)) { rangit-rangit y-vector() p-vector() for(i in 0:length(rangit)){ y[i]-x+s[1]+rangit[i] p[i]-x+s[2]+rangit[i] } return(data.frame(rangit,y,p)) } And returns the following: rangit y p 1 0 2 3 2 1 3 4 3 2 4 5 4 3 5 6 5 4 6 7 6 5 7 8 7 6 8 9 8 7 9 10 9 8 10 11 10 9 11 12 11 10 12 13 Which is what I want. The part I am having trouble with is that I want to write another function that will evaluate the previous function at multiple levels of x, where the levels of x may not always be the same length I have tried several options but so far have not been able to figure it out. I tried: testexp-function(x,s,rangit){ out-data.frame() for (j in 1:length(x)){ out[j]-test(x[j],s) } return(out) } Q2-testexp(x=c(1:4),s=c(1,2)) But that returns a warning with no values. I have also tried various other methods with no success. Basically what I want the output to look like is Out[1] Out[2] Out[3] Out[4] Rangit y p rangit y p rangit y p rangit y p 0 2 3 0 3 4 0 4 5 0 5 6 1 3 4 1 4 5 1 5 6 1 6 7 2 4 5 2 5 6 2 6 7 2 7 8 3 . . . . . . . . . . . 4 . . . . . . . . . . . 5 . . . . . . . . . . . 6 . . . . . . . . . . . 7 . . . . . . . . . . . 8 . . . . . . . . . . . 9 . . . . . . . . . . . 10 . . . . . . . . . . . Thanks for any help you can offer. Cameron Guenther, Ph.D. Associate Research Scientist FWC/FWRI, Marine Fisheries Research 100 8th Avenue S.E. St. Petersburg, FL 33701 (727)896-8626 Ext. 4305 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] More Logistic Regression Tools?
Frank E Harrell Jr wrote: Eric Rescorla [EMAIL PROTECTED] wrote: (2) I'd like to compute goodness-of-fit statistics for my fit (Hosmer-Lemeshow, Pearson, etc.). I didn't see a package that did this. Have I missed one? Hosmer-Lemeshow has low power and relies on arbitrary binning of predicted probabilities. The Hosmer-Le Cessie omnibus test is unique and has more power usually. To get it: f - update(f, x=T, y=T) resid(f, 'gof') # uses residuals.lrm The documentation of the Design package, section residuals.lrm, says CITE Details For the goodness-of-fit test, the le Cessie-van Houwelingen normal test statistic for the unweighted sum of squared errors (Brier score times n) is used. /CITE It is not clear to me whether the test implemented is for the statistic with a constant kernel described in S. le Cessie and J.C. van Houwelingen. A goodness-of-fit test for binary regression models, based on smoothing methods. Biometrics, 47:12671282, Dec 1991. or the variation in D.W. Hosmer, T. Hosmer, S. Le Cessie, and S. Lemeshow. A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine, 16:965980, 1997. Sorry, I couldn't get this from looking at the code either (I'm quite new to R). Is the statistic tested the T defined by le Cessie and van Houwelingen? Regards, -- Ramón Casero Cañas http://www.robots.ox.ac.uk/~rcasero/wiki http://www.robots.ox.ac.uk/~rcasero/blog __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] for bclust in package e1071
Hi All, Could you please help with the error I get from the following codes? Library(cluster) data(iris) bc1 - bclust(iris[,2:5], 3, base.method=clara, base.centers=5) Then I got: Committee Member: 1(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) Error in bclust(iris[, 2:5], 3, base.method = clara, base.centers = 5): Could not find valid cluster solution in 20 replications I don't quite understand what the error means here and how to fix it? Thank you! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] reshape question
Hi, I have a data fram like this: date column1 column2 column3 value1 value2 value3 1-1 A B C 10 5 2 2-1 A B D 5 2 0 3-1 A B E 17 10 7 How can I reshape it to: date column1 column2 column3 v x 1-1 A B C value1 10 1-1 A B C value2 5 1-1 A B C value3 2 2-1 A B D value1 5 2-1 A B D value2 2 2-1 A B D value3 0 3-1 A B E value1 17 3-1 A B E value2 10 3-1 A B E value3 7 Thx! Regards, Richard __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Evaluating a function with another function
I assume you want a list containing length(x) data frames, one per list component. If that's it try this: test - function(x = 1:4, s = 1:2, rangit = 0:11) lapply(x, function(x) data.frame(rangit, y = x+s[1]+rangit, p=x+s[2]+rangit)) test() # test run On 4/6/06, Guenther, Cameron [EMAIL PROTECTED] wrote: Hello all, I hope someone can help me with this. I have function that calculates two values based on input data. A simple example follows: test-function(x,s,rangit=seq(0,10,1)) { rangit-rangit y-vector() p-vector() for(i in 0:length(rangit)){ y[i]-x+s[1]+rangit[i] p[i]-x+s[2]+rangit[i] } return(data.frame(rangit,y,p)) } And returns the following: rangit y p 1 0 2 3 2 1 3 4 3 2 4 5 4 3 5 6 5 4 6 7 6 5 7 8 7 6 8 9 8 7 9 10 9 8 10 11 10 9 11 12 11 10 12 13 Which is what I want. The part I am having trouble with is that I want to write another function that will evaluate the previous function at multiple levels of x, where the levels of x may not always be the same length I have tried several options but so far have not been able to figure it out. I tried: testexp-function(x,s,rangit){ out-data.frame() for (j in 1:length(x)){ out[j]-test(x[j],s) } return(out) } Q2-testexp(x=c(1:4),s=c(1,2)) But that returns a warning with no values. I have also tried various other methods with no success. Basically what I want the output to look like is Out[1] Out[2] Out[3] Out[4] Rangit y p rangit y p rangit y p rangit y p 0 2 3 0 3 4 0 4 5 0 5 6 1 3 4 1 4 5 1 5 6 1 6 7 2 4 5 2 5 6 2 6 7 2 7 8 3 . . . . . . . . . . . 4 . . . . . . . . . . . 5 . . . . . . . . . . . 6 . . . . . . . . . . . 7 . . . . . . . . . . . 8 . . . . . . . . . . . 9 . . . . . . . . . . . 10 . . . . . . . . . . . Thanks for any help you can offer. Cameron Guenther, Ph.D. Associate Research Scientist FWC/FWRI, Marine Fisheries Research 100 8th Avenue S.E. St. Petersburg, FL 33701 (727)896-8626 Ext. 4305 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Indexing With List Of Vectors (Replacement)
Try this: do.call([-, c(list(b), idx, list(a))) On 4/6/06, Paul Roebuck [EMAIL PROTECTED] wrote: I have the following: a - matrix(1:10, nrow = 2, byrow = TRUE) b - array(as.integer(0), c(7, 5)) idx - list() length(idx) - 2 dim(idx) - c(1, 2) idx[[1]] - as.integer(1:2) idx[[2]] - as.integer(1:5) I can do the following, which works if 'b' is a matrix. b[idx[[1]], idx[[2]]] - a b [,1] [,2] [,3] [,4] [,5] [1,]13579 [2,]2468 10 [3,]88888 [4,]88888 [5,]88888 [6,]88888 [7,]88888 Looking for a way to do this generically such that 'idx' with 'n' length can be used to index n-dimensional arrays. b[translate 'idx' to array indices] - a -- SIGSIG -- signature too long (core dumped) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] reshape question
Just follow this example from example(reshape): reshape(wide, idvar=Subject, varying=list(names(wide)[2:12]), v.names=conc, direction=long) like this: long - reshape(dd, direction = long, varying = list(names(dd)[5:7]), idvar = v, v.names = x, times = paste(value, 1:3, sep=)) long[order(long$date),] Its even simpler if you use the reshape package: library(reshape) long - melt(dd, id = 1:4) long[order(long$date),] On 4/6/06, Richard van Wingerden [EMAIL PROTECTED] wrote: Hi, I have a data fram like this: date column1 column2 column3 value1 value2 value3 1-1 A B C 10 5 2 2-1 A B D 5 2 0 3-1 A B E 17 10 7 How can I reshape it to: date column1 column2 column3 v x 1-1 A B C value1 10 1-1 A B C value2 5 1-1 A B C value3 2 2-1 A B D value1 5 2-1 A B D value2 2 2-1 A B D value3 0 3-1 A B E value1 17 3-1 A B E value2 10 3-1 A B E value3 7 Thx! Regards, Richard __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] problem with statetable function in msm
Just a note to say if you use msm you may find that statetable returns an incorrect statetable: you can then build an incorrect model and not even know that statetable was the villain. shfets __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Referencing variables in a dataframe.
I have a question about how to reference variables in a dataframe. Normally, after I have read in some Stata data using the following command all - read.dta('all.dta') Whenever I want to use the variable sat.vr1 in the all data frame, I do so using all$sat.vr1 However, I'd like to be able to use the sat.vr1 variable without the all$ (as well as all of the other variables of the dataframe). Is there some way to make a particular dataframe the active or default one so that a can do this? Thanks, Brian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Referencing variables in a dataframe.
Here are two ways: with(iris, Sepal.Length + Sepal.Width) attach(iris) Species On 4/6/06, Brian Quinif [EMAIL PROTECTED] wrote: I have a question about how to reference variables in a dataframe. Normally, after I have read in some Stata data using the following command all - read.dta('all.dta') Whenever I want to use the variable sat.vr1 in the all data frame, I do so using all$sat.vr1 However, I'd like to be able to use the sat.vr1 variable without the all$ (as well as all of the other variables of the dataframe). Is there some way to make a particular dataframe the active or default one so that a can do this? Thanks, Brian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Financial functions
R has many different functions for financial computations, but I don't know if you will find them all together. Are you familiar with RSiteSearch (including the second restrict argument)? If you can't find what you want, please submit another post (after first reading the posting guide! www.R-project.org/posting-guide.html; the guide has helped people answer their own questions and, failing that, posts more consistent with that guide tend to receive quicker, more useful answers). hope this helps, spencer graves vittorio wrote: In what R package(-s) can I find the entire set of financial functions that you can find in MS-Excel such as PMT, PPMT, FV and IPMT? Ciao Vittorio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Financial functions
I don't know what PMT, etc, does. So I just give some hints and maybe it is not so useful. There are several packages in CRAN related to financial.Have you try to find some functions in them? fBasics Rmetrics - Marketes and Basic Statistics fCalendar Rmetrics - Chronological Objects fExtremes Rmetrics - Extreme Financial Market Data fMultivar Rmetrics - Multivariate Market Analysis fOptions Rmetrics - Option Valuation fPortfolio Rmetrics - Portfolio Selection and Optimization fSeries Rmetrics - The Dynamical Process Behind Markets 2006/4/5, vittorio [EMAIL PROTECTED]: In what R package(-s) can I find the entire set of financial functions that you can find in MS-Excel such as PMT, PPMT, FV and IPMT? Ciao Vittorio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- 黄荣贵 Deparment of Sociology Fudan University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Financial functions
Those functions in Excel are for computing things like interest payment for fixed interest rate loan, etc. RSiteSearch() doesn't turn up anything useful, so I'd guess no one has written them. Good opportunity for people to contribute, I guess. Andy From: Spencer Graves R has many different functions for financial computations, but I don't know if you will find them all together. Are you familiar with RSiteSearch (including the second restrict argument)? If you can't find what you want, please submit another post (after first reading the posting guide! www.R-project.org/posting-guide.html; the guide has helped people answer their own questions and, failing that, posts more consistent with that guide tend to receive quicker, more useful answers). hope this helps, spencer graves vittorio wrote: In what R package(-s) can I find the entire set of financial functions that you can find in MS-Excel such as PMT, PPMT, FV and IPMT? Ciao Vittorio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Financial functions
See the first two hits for: RSiteSearch(IRR) On 4/6/06, Liaw, Andy [EMAIL PROTECTED] wrote: Those functions in Excel are for computing things like interest payment for fixed interest rate loan, etc. RSiteSearch() doesn't turn up anything useful, so I'd guess no one has written them. Good opportunity for people to contribute, I guess. Andy From: Spencer Graves R has many different functions for financial computations, but I don't know if you will find them all together. Are you familiar with RSiteSearch (including the second restrict argument)? If you can't find what you want, please submit another post (after first reading the posting guide! www.R-project.org/posting-guide.html; the guide has helped people answer their own questions and, failing that, posts more consistent with that guide tend to receive quicker, more useful answers). hope this helps, spencer graves vittorio wrote: In what R package(-s) can I find the entire set of financial functions that you can find in MS-Excel such as PMT, PPMT, FV and IPMT? Ciao Vittorio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Command line support tools - suggestions?
Dear Community, I'm interested in developing a package that could ease the command-line learning curve for new users. It would provide more detailed syntax checking and commentary as feedback. It would try to anticipate common new-user errors, and provide feedback to help correct them. As a trivial example, instead of mean(c(1,2,NA)) [1] NA we might have mean(c(1,2,NA)) [1] NA Warning: your data contain missing values. If you wish to ignore these to compute a mean, use na.rm=TRUE. I'm interested in any thoughts that people have about this idea - what errors do you commonly see, and how can they be dealt with? I have applied for funding for a couple of months of programming support for this idea, and will find out whether or not I get it later this month. Until then, I thought that it would be interesting to kick around some ideas for the sorts of changes that we could try. Cheers, Andrew -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] rpart.predict error--subscript out of bounds
Hi, I am using rpart to do leave one out cross validation, but met some problem, Data is a data frame, the first column is the subject id, the second column is the group id, and the rest columns are numerical variables, Data[1:5,1:10] sub.id group.id X3262.345 X3277.402 X3369.036 X3439.895 X3886.935 X3939.054 X3953.777 X3970.352 1 32613 HAM_TSP 417.7082 430.4895 619.4776 720.8246 1048.1290 374.4598 770.4653 646.8712 2 32616 HAM_TSP 2553.5913 1113.4997 225.5340 274.5644 1105.7908 363.4623 2380.1872 784.9196 3 32617 HAM_TSP 291.6596 314.7322 238.3287 315.8305 982.6276 299.5855 1059.1140 540.8592 4 32628 HAM_TSP 456.9504 508.2278 552.3632 719.9989 1306.6299 446.6184 1352.9955 867.4219 5 32629 HAM_TSP 898.8879 640.2680 342.5477 386.5816 811.6709 518.0244 715.9886 441.1622 Example, I use the first sample as test set, the rest as training set fit - rpart(as.factor(Data[-1,2]) ~., Data[-1, -c(1:2 ) ], minbucket=2 ) predict(fit, Data[1,],type='prob') Error in predict.rpart(fit, Data[1, ]) : subscript out of bounds but when I changes the parameter of type into 'class' it works well predict(fit, Data[1,-c(1:2)],type='class') [1] HTLV_Carrier Levels: HAM_TSP HTLV_Carrier Leukemia Normal Could anyone tell me what is the problem with it? Thanks very much in advance! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Command line support tools - suggestions?
I'm interested in any thoughts that people have about this idea - what errors do you commonly see, and how can they be dealt with? NAs introduced by coercion __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Using vectorization instead of for loop for performing a calculation efficiently
Thanks, Ok so I feel silly now ... I think I need to get a more thorough text and work through some examples. 'replace' does what I was trying to do manually. Thanks for the tip, this really is MUCH faster. -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Saturday, April 01, 2006 7:58 PM To: Peter Wilkinson Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Using vectorization instead of for loop for performing a calculation efficiently Try this: set.seed(1) mat - matrix(rnorm(2 * 10), 2) system.time(mat2 - replace(mat, rowSums(mat 0) == 10, NA)) [1] 0.04 0.01 0.05 NA NA R.version.string # Windows XP [1] R version 2.2.1, 2005-12-20 On 4/1/06, Peter Wilkinson [EMAIL PROTECTED] wrote: I am trying to write an efficient function that will do the following: Given an nxm matrix, 10 rows (observations) by 10 columns (samples) for each row, test of all values in the row are greater than a value k If all values are greater than k, then set all values to NA (or something), Return an nxm matrix with the modified rows. If I do this with a matrix of 20,000 rows, I will be waiting until Christmas for it to finish: For rows in Matrix: if rows filter set all elements in rows to NA (or something) else do nothing Return the matrix with the modified rows I don't know how to code this properly. The following: If (sum(ifelse(nvectorfilter,1,0) == 0 ) ) Tells me if any row has at least 1 value above the filter. How do I get rid of the 'outer' loop? Peter - Peter Wilkinson Senior Bioinformatician / Programmer-Analyst National Immune Monitoring Laboratory tel: (514)-343-7876 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] creating files using for loop
I would like to use a for loop to run estimations on 12 different subsets of a dataset. I have the basics of my script figured out, but I am having problems getting the loop to name some files as I would like. Here is a sketch of my code: sub.dataset - c(101, 201) #Assume I only have two subsets which I call 101 and 201 for (i in 1:length(sub.dataset)) { #Estimation commands here...unimportant for this question #Estimates - matrix of estimation resultsI will write out to LaTeX latex(Estimates, file=sub.dataset[i].tex) } So, the latex command is what's problematic here. How can I get my for loop to put file='101.tex' and file='201.tex' (or many similar things) where I want it to. I could just as easily want to put something like 101$variable1 or 201$variable1. Sorry for such a basic question regarding for loops, but thanks for the help. Brian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] creating files using for loop
See ?paste and try: paste(sub.dataset[i], tex, sep = .) On 4/7/06, Brian Quinif [EMAIL PROTECTED] wrote: I would like to use a for loop to run estimations on 12 different subsets of a dataset. I have the basics of my script figured out, but I am having problems getting the loop to name some files as I would like. Here is a sketch of my code: sub.dataset - c(101, 201) #Assume I only have two subsets which I call 101 and 201 for (i in 1:length(sub.dataset)) { #Estimation commands here...unimportant for this question #Estimates - matrix of estimation resultsI will write out to LaTeX latex(Estimates, file=sub.dataset[i].tex) } So, the latex command is what's problematic here. How can I get my for loop to put file='101.tex' and file='201.tex' (or many similar things) where I want it to. I could just as easily want to put something like 101$variable1 or 201$variable1. Sorry for such a basic question regarding for loops, but thanks for the help. Brian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] creating files using for loop
I tried that, and the problem seems to be that the results of the 'paste'ing are enclosed in quotes...that is the paste yields '101.tex'. Of course, that is what I said I wanted above, and it should work for that particular instance, but I also want to be able to create estimates.101 and estimates.201, for example. That is, I want to be able to just generate the value of sub.dataset[i] and follow it or precede it by whatever I choose, not necessarily quotation marks. Thanks, BQ 2006/4/7, Gabor Grothendieck [EMAIL PROTECTED]: See ?paste and try: paste(sub.dataset[i], tex, sep = .) On 4/7/06, Brian Quinif [EMAIL PROTECTED] wrote: I would like to use a for loop to run estimations on 12 different subsets of a dataset. I have the basics of my script figured out, but I am having problems getting the loop to name some files as I would like. Here is a sketch of my code: sub.dataset - c(101, 201) #Assume I only have two subsets which I call 101 and 201 for (i in 1:length(sub.dataset)) { #Estimation commands here...unimportant for this question #Estimates - matrix of estimation resultsI will write out to LaTeX latex(Estimates, file=sub.dataset[i].tex) } So, the latex command is what's problematic here. How can I get my for loop to put file='101.tex' and file='201.tex' (or many similar things) where I want it to. I could just as easily want to put something like 101$variable1 or 201$variable1. Sorry for such a basic question regarding for loops, but thanks for the help. Brian __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html