[R] aggregate factor

2007-09-07 Thread Bill Szkotnicki
Hi,
I am using aggregate to compute means for later plotting.
There are two factors involved and the problem is that the values of the 
second factor ( Age ) in the means are not in the right order because 
10 comes inbetween 1 and 2
What I really want is the numeric value of Age but as.numeric and 
as.integer returns the level value instead.
Is there a way to easily get the numeric value?
I am using Windows R 2.5.1

Thanks,

  str(fishdata)
'data.frame':   372 obs. of  6 variables:
 $ Lake: Factor w/ 3 levels EVANS,JOLLIET,..: 3 3 3 3 3 3 3 3 3 3 ...
 $ Age : int  1 1 1 1 1 1 1 1 1 1 ...
 $ TL  : int  132 120 125 115 130 120 115 110 117 116 ...
 $ W   : int  10 10 10 10 10 10 10 10 10 20 ...
 $ Sex : Factor w/ 3 levels F,I,M: 1 1 2 2 2 1 1 1 2 2 ...
 $ WT  : num  0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 0.23 ...
  fishdatameans=aggregate(fishdata$TL, list(Lake = fishdata$Lake, 
Age=fishdata$Age), mean)

#  Now Age is a Factor but 10 is in the wrong position.
  fishdatameans$Age
 [1] 0  1  1  1  2  2  2  3  3  3  4  4  4  5  5  6  6  6  7  8  9  10
Levels: 0 1 10 2 3 4 5 6 7 8 9

  as.numeric(fishdatameans$Age)
 [1]  1  2  2  2  4  4  4  5  5  5  6  6  6  7  7  8  8  8  9 10 11  3
 
# What I want  is    0  1  1  1  2  2  2  3  3  3  4  4  4  5  5  
6  6  6  7  8  9  10

Bill

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RODBC

2007-05-29 Thread Bill Szkotnicki
I have now read the README file which I should have done before. :-[   
Sorry.
To summarize:
- Install the odbc connector driver (3.51)
- Set up the dsn in the file   .odbc.ini
- It works beautifully and RODBC is super!


Prof Brian Ripley wrote:
 yOn Mon, 28 May 2007, Bill Szkotnicki wrote:

 Hello,

 I have installed R2.5.0 from sources ( x86_64 )
 and added the package RODBC
 and now I am trying to connect to a mysql database
 In windows R after installing the 3.51 driver
 and creating the dsn by specifying server, user, and password
 it is easy to connect with
 channel - odbcConnect(dsn)

 Does anyone know what needs to be done to make this work from linux?

 Did you not read the RODBC README file?  It is described in some detail
 with reference to tutorials.


-- 
Bill Szkotnicki
Department of Animal and Poultry Science
University of Guelph
[EMAIL PROTECTED]
(519)824-4120 Ext 52253

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RODBC

2007-05-28 Thread Bill Szkotnicki
Hello,

I have installed R2.5.0 from sources ( x86_64 )
and added the package RODBC
and now I am trying to connect to a mysql database
In windows R after installing the 3.51 driver
and creating the dsn by specifying server, user, and password
it is easy to connect with
channel - odbcConnect(dsn)

Does anyone know what needs to be done to make this work from linux?

Thanks, Bill

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RODBC sqlQuery insert slow

2006-10-13 Thread Bill Szkotnicki
Hello,
I am trying to insert a lot of data into a table using windows R (2.3.1) 
and a mysql database via RODBC.
First I read a file with read.csv and then form sql insert statements 
for each row and execute the insert query one row at a time. See the 
loop below.
This turns out to be very slow.
Can anyone please suggest a way to speed it up?

Thanks, Bill

# R code
ntry=dim(ti)[1]
date()
nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger)
for (i in 1:ntry) {
sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES(
d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p)
sql=paste(sql,',d1,' )
sql=paste(sql,,,ti[i,3] )
sql=paste(sql,,,ti[i,4] )
sql=paste(sql,,,ti[i,5] )
sql=paste(sql,,,ti[i,6] )
sql=paste(sql,,,ti[i,7] )
sql=paste(sql,,,ti[i,8] )
sql=paste(sql,,,ti[i,9] )
sql=paste(sql,,,ti[i,10])
sql=paste(sql,,,ti[i,11])
sql=paste(sql,,,ti[i,12])
sql=paste(sql,) )
#print(sql)
sqlQuery(channel, sql)
}
nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger)
nadded=nafter-nbefore;nadded
date()

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RODBC sqlQuery insert slow

2006-10-13 Thread Bill Szkotnicki
Thanks for the help ... the  sqlSave() function was the solution.
The lesson, which has been stated many times before,  is to avoid loops 
wherever possible!
Bill

# fast RODBC inserting
dat - cbind(as.character(strptime(ti[,2],%d/%m/%y %H:%M:%S 
%p)),ti[,3:12])
# you need the as.character to make sure the time is stored correctly in 
mysql
names(dat)=c(time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10)
sqlSave(channel,dat,logger,rownames=F,append=T) # very fast.
#

Jerome Asselin wrote:
 On Fri, 2006-10-13 at 09:09 -0400, Bill Szkotnicki wrote:
   
 Hello,
 I am trying to insert a lot of data into a table using windows R (2.3.1) 
 and a mysql database via RODBC.
 First I read a file with read.csv and then form sql insert statements 
 for each row and execute the insert query one row at a time. See the 
 loop below.
 This turns out to be very slow.
 Can anyone please suggest a way to speed it up?

 Thanks, Bill

 # R code
 ntry=dim(ti)[1]
 date()
 nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger)
 for (i in 1:ntry) {
 sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES(
 d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p)
 sql=paste(sql,',d1,' )
 sql=paste(sql,,,ti[i,3] )
 sql=paste(sql,,,ti[i,4] )
 sql=paste(sql,,,ti[i,5] )
 sql=paste(sql,,,ti[i,6] )
 sql=paste(sql,,,ti[i,7] )
 sql=paste(sql,,,ti[i,8] )
 sql=paste(sql,,,ti[i,9] )
 sql=paste(sql,,,ti[i,10])
 sql=paste(sql,,,ti[i,11])
 sql=paste(sql,,,ti[i,12])
 sql=paste(sql,) )
 #print(sql)
 sqlQuery(channel, sql)
 }
 nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger)
 nadded=nafter-nbefore;nadded
 date()
 

 I sure will try to help you out here. I've been working with RODBC. I
 think what slows you down here is your loop with multiple paste
 commands.

 Have you considered the sqlSave() function with the append=T argument? I
 think you could replace your loop with:

 dat - cbind(strptime(ti[,2],%d/%m/%y %H:%M:%S %p),d1,ti[,3:12])
 sqlSave(channel,dat,logger,append=T)

 Of course, I haven't tested this so you may need some minor adjustments,
 but I think this will greatly speed up your insert job.

 Regards,
 Jerome


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict.lm

2006-05-02 Thread Bill Szkotnicki
I have a model with a few correlated explanatory variables.
i.e.
 m1=lm(y~x1+x2+x3+x4,protdata)
and I have used predict as follows:

 x=data.frame(x=1:36)
 yp=predict(m1,x,se.fit=T) 
 tprot=sum(yp$fit) # add up the predictions
 tprot

tprot is the sum of the 36 predicted values and I would like the se of that
prediction.
I think  
 sqrt(sum(yp$se.fit^2))
is not correct.

Would anyone know the correct approach?
i.e. How to get the se of a function of predicted values (in this case sum)
 
Thanks, Bill

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] predict.lm

2006-05-02 Thread Bill Szkotnicki
I did mean to use x1,x2,x3,x4 in the new data frame.

And I think the theory would be something like

yhat = 1' K' bhat   
and so the variance should be  1' K'CK 1  where C=(X'X)-1   
and 1 is a 1 vector.

The question is do I need to form these matrices and grind through it or is
there an easier way?

 
Bill
 

-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 02, 2006 2:54 PM
To: Christos Hatzis
Cc: 'Bill Szkotnicki'; 'R-Help help'
Subject: Re: [R] predict.lm

On Tue, 2 May 2006, Christos Hatzis wrote:

 I think you got it right.

 The mean of the (weighted) sum of a set of random variables is the
 (weighted) sum of the means and its variance is the (weighted) sum of the
 individual variances (using squared weights).  Here you don't have to
worry
 about weights.

 So what you proposed does exactly this.

Yes, but the theory has assumptions which are not met here: the random 
variables are correlated (in almost all case).

 -Christos

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bill Szkotnicki
 Sent: Tuesday, May 02, 2006 2:59 PM
 To: 'R-Help help'
 Subject: [R] predict.lm

 I have a model with a few correlated explanatory variables.
 i.e.
 m1=lm(y~x1+x2+x3+x4,protdata)
 and I have used predict as follows:

 x=data.frame(x=1:36)
 yp=predict(m1,x,se.fit=T)

How can this work?  You fitted the model to x1...x4 and supplied x.

 tprot=sum(yp$fit) # add up the predictions tprot

 tprot is the sum of the 36 predicted values and I would like the se of
that
 prediction.
 I think
 sqrt(sum(yp$se.fit^2))
 is not correct.

 Would anyone know the correct approach?
 i.e. How to get the se of a function of predicted values (in this case
sum)

You need to go back to the theory: it is easy to do for a linear function, 
otherwise you will need to linearize.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] statistical modelling SAS vs R

2006-02-03 Thread Bill Szkotnicki
Hello,

Recently I have been reading a lot of material about statistical modeling
using R. There seems to be conflicting opinions about what the best approach
is between the SAS community and the R community.
1) In R one might start with a model that has all possible effects of
interest in it and then simplify by eliminating/adding insignificant effects
using a stepwise procedure.
2) In SAS one may starts with a reasonable model and look at type 3 SS's
to test hypotheses and report LSMEANS. This can be done in R too I think.

Does anyone have current opinions about this? I know it's been discussed
before but I would be very interested in hearing about the advantages and
pitfalls of both approaches.

Bill

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html