[R] aggregate factor
Hi, I am using aggregate to compute means for later plotting. There are two factors involved and the problem is that the values of the second factor ( Age ) in the means are not in the right order because 10 comes inbetween 1 and 2 What I really want is the numeric value of Age but as.numeric and as.integer returns the level value instead. Is there a way to easily get the numeric value? I am using Windows R 2.5.1 Thanks, str(fishdata) 'data.frame': 372 obs. of 6 variables: $ Lake: Factor w/ 3 levels EVANS,JOLLIET,..: 3 3 3 3 3 3 3 3 3 3 ... $ Age : int 1 1 1 1 1 1 1 1 1 1 ... $ TL : int 132 120 125 115 130 120 115 110 117 116 ... $ W : int 10 10 10 10 10 10 10 10 10 20 ... $ Sex : Factor w/ 3 levels F,I,M: 1 1 2 2 2 1 1 1 2 2 ... $ WT : num 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 ... fishdatameans=aggregate(fishdata$TL, list(Lake = fishdata$Lake, Age=fishdata$Age), mean) # Now Age is a Factor but 10 is in the wrong position. fishdatameans$Age [1] 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 6 6 6 7 8 9 10 Levels: 0 1 10 2 3 4 5 6 7 8 9 as.numeric(fishdatameans$Age) [1] 1 2 2 2 4 4 4 5 5 5 6 6 6 7 7 8 8 8 9 10 11 3 # What I want is 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 6 6 6 7 8 9 10 Bill __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC
I have now read the README file which I should have done before. :-[ Sorry. To summarize: - Install the odbc connector driver (3.51) - Set up the dsn in the file .odbc.ini - It works beautifully and RODBC is super! Prof Brian Ripley wrote: yOn Mon, 28 May 2007, Bill Szkotnicki wrote: Hello, I have installed R2.5.0 from sources ( x86_64 ) and added the package RODBC and now I am trying to connect to a mysql database In windows R after installing the 3.51 driver and creating the dsn by specifying server, user, and password it is easy to connect with channel - odbcConnect(dsn) Does anyone know what needs to be done to make this work from linux? Did you not read the RODBC README file? It is described in some detail with reference to tutorials. -- Bill Szkotnicki Department of Animal and Poultry Science University of Guelph [EMAIL PROTECTED] (519)824-4120 Ext 52253 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC
Hello, I have installed R2.5.0 from sources ( x86_64 ) and added the package RODBC and now I am trying to connect to a mysql database In windows R after installing the 3.51 driver and creating the dsn by specifying server, user, and password it is easy to connect with channel - odbcConnect(dsn) Does anyone know what needs to be done to make this work from linux? Thanks, Bill __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC sqlQuery insert slow
Hello, I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? Thanks, Bill # R code ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for (i in 1:ntry) { sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES( d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] ) sql=paste(sql,,,ti[i,10]) sql=paste(sql,,,ti[i,11]) sql=paste(sql,,,ti[i,12]) sql=paste(sql,) ) #print(sql) sqlQuery(channel, sql) } nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlQuery insert slow
Thanks for the help ... the sqlSave() function was the solution. The lesson, which has been stated many times before, is to avoid loops wherever possible! Bill # fast RODBC inserting dat - cbind(as.character(strptime(ti[,2],%d/%m/%y %H:%M:%S %p)),ti[,3:12]) # you need the as.character to make sure the time is stored correctly in mysql names(dat)=c(time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) sqlSave(channel,dat,logger,rownames=F,append=T) # very fast. # Jerome Asselin wrote: On Fri, 2006-10-13 at 09:09 -0400, Bill Szkotnicki wrote: Hello, I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? Thanks, Bill # R code ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for (i in 1:ntry) { sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES( d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] ) sql=paste(sql,,,ti[i,10]) sql=paste(sql,,,ti[i,11]) sql=paste(sql,,,ti[i,12]) sql=paste(sql,) ) #print(sql) sqlQuery(channel, sql) } nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() I sure will try to help you out here. I've been working with RODBC. I think what slows you down here is your loop with multiple paste commands. Have you considered the sqlSave() function with the append=T argument? I think you could replace your loop with: dat - cbind(strptime(ti[,2],%d/%m/%y %H:%M:%S %p),d1,ti[,3:12]) sqlSave(channel,dat,logger,append=T) Of course, I haven't tested this so you may need some minor adjustments, but I think this will greatly speed up your insert job. Regards, Jerome __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] predict.lm
I have a model with a few correlated explanatory variables. i.e. m1=lm(y~x1+x2+x3+x4,protdata) and I have used predict as follows: x=data.frame(x=1:36) yp=predict(m1,x,se.fit=T) tprot=sum(yp$fit) # add up the predictions tprot tprot is the sum of the 36 predicted values and I would like the se of that prediction. I think sqrt(sum(yp$se.fit^2)) is not correct. Would anyone know the correct approach? i.e. How to get the se of a function of predicted values (in this case sum) Thanks, Bill __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] predict.lm
I did mean to use x1,x2,x3,x4 in the new data frame. And I think the theory would be something like yhat = 1' K' bhat and so the variance should be 1' K'CK 1 where C=(X'X)-1 and 1 is a 1 vector. The question is do I need to form these matrices and grind through it or is there an easier way? Bill -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 02, 2006 2:54 PM To: Christos Hatzis Cc: 'Bill Szkotnicki'; 'R-Help help' Subject: Re: [R] predict.lm On Tue, 2 May 2006, Christos Hatzis wrote: I think you got it right. The mean of the (weighted) sum of a set of random variables is the (weighted) sum of the means and its variance is the (weighted) sum of the individual variances (using squared weights). Here you don't have to worry about weights. So what you proposed does exactly this. Yes, but the theory has assumptions which are not met here: the random variables are correlated (in almost all case). -Christos -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bill Szkotnicki Sent: Tuesday, May 02, 2006 2:59 PM To: 'R-Help help' Subject: [R] predict.lm I have a model with a few correlated explanatory variables. i.e. m1=lm(y~x1+x2+x3+x4,protdata) and I have used predict as follows: x=data.frame(x=1:36) yp=predict(m1,x,se.fit=T) How can this work? You fitted the model to x1...x4 and supplied x. tprot=sum(yp$fit) # add up the predictions tprot tprot is the sum of the 36 predicted values and I would like the se of that prediction. I think sqrt(sum(yp$se.fit^2)) is not correct. Would anyone know the correct approach? i.e. How to get the se of a function of predicted values (in this case sum) You need to go back to the theory: it is easy to do for a linear function, otherwise you will need to linearize. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] statistical modelling SAS vs R
Hello, Recently I have been reading a lot of material about statistical modeling using R. There seems to be conflicting opinions about what the best approach is between the SAS community and the R community. 1) In R one might start with a model that has all possible effects of interest in it and then simplify by eliminating/adding insignificant effects using a stepwise procedure. 2) In SAS one may starts with a reasonable model and look at type 3 SS's to test hypotheses and report LSMEANS. This can be done in R too I think. Does anyone have current opinions about this? I know it's been discussed before but I would be very interested in hearing about the advantages and pitfalls of both approaches. Bill __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html