Did you really say you're using Word's mail merge to construct "hundreds" of pages of R code which you then paste in to R ? It sounds like you just missed somehow how to create a function in R. Did you fully read the book Introduction to R ? Did you know R can read xls directly, and connect to spreadsheets as if they were databases, see ?odbcConnectExcel.
Your graphs may exist and be beautiful but are they correct ? This link contains a formal discussion of the topic : http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html "Jerry Floren" <[email protected]> wrote in message news:[email protected]... > > As a novice R user, I face a similar challenge. I am almost afraid to > share > with this group how I solved it. About 65 labs in our proficiency program > submit data on individual Excel spreadsheets with triple replicates. There > always are a few labs that do not complete the full set of three > replicates, > and I do not want their data included in my analysis. > > First, I combine all the individual spreadsheets into one large Excel > spreadsheet. The replicates are in three columns: rep1, rep2, and rep3. I > sort on each individual rep column in Excel. Then I go to both the top and > the bottom of the list. > > For example, I sort on rep1 and go to the top of the list to delete any > rows > where a value for rep1 was not recorded. Then I go to the bottom of the > list > and delete any rows where rep1 is text instead of a number, for example, > <0.001. I should say that the labs are instructed that they must complete > all three replicates, and they must not enter results as text. Next I > repeat > the process for rep2 and rep3. > > I'll do a little more work in Excel on the large, combined table with all > the lab data. I calculate in Excel the mean, standard deviation, and > coefficient of variation for each of the three reps. Finally, I filter all > the data and delete duplicate rows. This is necessary as I sometimes > accidentally copy the same spreadsheet two times from a lab into my large > table. Finally, I save the cleaned up table in *.csv format that is easily > read into R. > > I know that R can do all of these things, but if you are just learning how > to use R it might be easier to do some initial work in Excel, or a similar > spreadsheet, before running your data through R. > > I also use MS-Word's mail merge feature to generate my code. I'll get > three > or four pages of code doing what I want for a single analytical test, for > example, calcium. Then I'll use the mail merge feature to generate > hundreds > of pages of code with the other analytical tests (nitrogen, phosphorus, > potassium, etc.). I just copy and paste the large, merged Word document > into > R. R cranks away for 30 minutes and I end up with several large tables > (and > these get additional editing in Ecel) and hundreds of beautiful graphs > that > would take weeks to create in Excel. > > I was amazed that Word would work. I expected all of Word's special print > control codes would mess things up. I just recently received a new laptop > computer, and now I have an occassional problem with Word's "pretty print > quotes," but if you know about that problem, it is easy to fix. > > Jerry Floren > Minnesota Department of Agriculture > > > > > > Matthew Dowle-3 wrote: >> >> >> As can data.table (i.e. do 'having' in one statement) : >> >>> DT = data.table(DF) >>> DT[,list(n=length(NAME),mean(SCORE)),by="NAME"][n==3] >> NAME n V2 >> [1,] James 3 64.00000 >> [2,] Tom 3 78.66667 >>> >> >> but data.table isn't restricted to SQL functions (such as avg), any R >> functions can be used, sometimes for their side effects (such as >> plotting) >> rather than just returning data. >> >> Further data.table has a thing called 'join inherited scoping'. Say we >> knew the specific groups, we can go directly to them (without even >> looking >> at the rest of the data in the table) in very short and convenient >> syntax, >> which also happens to run quickly on large data sets (but can be useful >> just >> for the syntax alone) : >> >>> setkey(DT,NAME) >>> DT[c("James","Tom"),mean(SCORE),mult="all"] >> NAME V1 >> [1,] James 64.00000 >> [2,] Tom 78.66667 >>> >> >> Notice there is no "group by" or even a "by" in the above. It inherits >> the >> scope from the join because mult="all" means that "James" matches to >> multiple rows, as does "Tom", creating two groups. It does it by binary >> search to the beginning of each group, binary search to the end of the >> group, and runs the R expression inside the scope of that group. >> >> An example of join inherited scoping for the side effects only : >> >>> pdf("out.pdf") >>> DT[c("James","Tom"),plot(SCORE),mult="all"] >> NULL data table >>> dev.off() >> # out.pdf now contains 2 plots >> >> which you couldn't do in SQL because SQL has no plotting (or any of R's >> other packages). >> >> It aims to do this quickly. Where 'quickly' means 1) shorter code is >> quicker to write, read, debug and maintain and also 2) quicker to >> compute, >> and its 1 that often dominates 2. >> >> Finally, consider the following two statements which are both equivalent >> : >> >>> sqldf("select NAME, avg(SCORE) from DF group by NAME having count(*) = >>> 3") >> NAME avg(SCORE) >> 1 James 64.00000 >> 2 Tom 78.66667 >>> DT[ J(DT[,length(NAME),by="NAME"][V1==3,NAME]), mean(SCORE), mult="all"] >> NAME avg(SCORE) >> 1 James 64.00000 >> 2 Tom 78.66667 >> >> Now ok I hear you groaning (!) that the 2nd looks (on first glance) ugly, >> but bear with me ... in the SQL solution do you know for sure that >> avg(SCORE) isn't computed wastefully for the all the groups that don't >> have >> count(*)=3 ? It might well do the 'group by' first for all the groups, >> then >> do the 'having' afterwards as a 'where' on the result. It might depend >> on >> the particular SQL database being used (mySQL, sqllite, etc) or the >> installation parameters, any indexes etc. Some investigation would be >> required (taking time) if someone doesn't already know. In the >> data.table >> however, the syntax explictly makes it clear than mean(SCORE) is only >> computed for the particular groups. For certain, always. Maybe this >> particular example is not a good one, but I'm trying to demonstrate an >> overall syntax which is scalable (i.e. this syntax can do more >> complicated >> things that SQL can't, or can't do well). Notice that the method >> earlier >> on i.e. "DT[,list(n=length(NAME),mean(SCORE)),by="NAME"][n==3]" is >> simpler >> but wasteful as it does compute mean(SCORE) for all the groups. But the >> syntax explicity conveys what is being done, and the user has the choice. >> >> >> "Gabor Grothendieck" <[email protected]> wrote in message >> news:[email protected]... >> Here is the solution using sqldf which can do it in one statement: >> >>> # read in data >>> Lines <- "OBS NAME SCORE >> + 1 Tom 92 >> + 2 Tom 88 >> + 3 Tom 56 >> + 4 James 85 >> + 5 James 75 >> + 6 James 32 >> + 7 Dawn 56 >> + 8 Dawn 91 >> + 9 Clara 95 >> + 10 Clara 84" >>> >>> DF <- read.table(textConnection(Lines), header = TRUE) >>> >>> # run >>> library(sqldf) >>> sqldf("select NAME, avg(SCORE) from DF group by NAME having count(*) = >>> 3") >> NAME avg(SCORE) >> 1 James 64.00000 >> 2 Tom 78.66667 >> >> >> On Tue, Jan 5, 2010 at 2:03 PM, Gabor Grothendieck >> <[email protected]> wrote: >>> Have a look at this post and the rest of that thread: >>> >>> https://stat.ethz.ch/pipermail/r-help/2010-January/223420.html >>> >>> On Tue, Jan 5, 2010 at 1:29 PM, Geoffrey Smith <[email protected]> wrote: >>>> Hello, does anyone know how to take the mean for a subset of >>>> observations? >>>> For example, suppose my data looks like this: >>>> >>>> OBS NAME SCORE >>>> 1 Tom 92 >>>> 2 Tom 88 >>>> 3 Tom 56 >>>> 4 James 85 >>>> 5 James 75 >>>> 6 James 32 >>>> 7 Dawn 56 >>>> 8 Dawn 91 >>>> 9 Clara 95 >>>> 10 Clara 84 >>>> >>>> Is there a way to get the mean of the SCORE variable by NAME but only >>>> when >>>> the number of observations is equal to 3? In other words, is there a >>>> way >>>> to >>>> get the mean of the SCORE variable for Tom and James, but not for Dawn >>>> and >>>> Clara? Thank you. >>>> >>>> -- >>>> Geoffrey Smith >>>> Visiting Assistant Professor >>>> Department of Finance >>>> W. P. Carey School of Business >>>> Arizona State University >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> [email protected] mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >> >> ______________________________________________ >> [email protected] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > View this message in context: > http://n4.nabble.com/mean-for-subset-tp999254p1008892.html > Sent from the R help mailing list archive at Nabble.com. > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

