[R] Memory management
I am trying to run a very large Bradley-Terry model using the BradleyTerry2 package. (There are 288 players in the BT model). My problem is that I ran the model below successfully. WLMat is a win-loss matrix that is 288 by 288 WLdf-countsToBinomial(WLMat) mod1-BTm(cbind(win1,win2),player1,player2,~player,id=player,data=WLdf) Then I needed to run the same model with a subset of the observations that went into the win-loss matrix. So I created my new win-loss matrix and tried to run a new model. Now I get: Error: cannot allocate vector of size 90.5 Mb I found this particularly puzzling because the actual input data is the same size as the original model, just different values. I tried increasing memory size, I tried running it in a clean workspace and the error message is always the same (sometimes the vector it is trying to allocate is 181.0MB (twice as large)) but it is always one of those two numbers no matter what I have done to the available memory. To further complicate this...I cannot get the system to re-run my first model either . Same errors. traceback indicates that the error occurs when the program is trying to do a qr decomposition. R 2.13.0 Windows XP Any suggestions? W. Michael Conklin Chief Methodologist Google Voice: (612) 56STATS MarketTools, Inc. | www.markettools.comhttp://www.markettools.com 6465 Wayzata Blvd | Suite 170 | St. Louis Park, MN 55426. PHONE: 952.417.4719 | CELL: 612.201.8978 This email and attachment(s) may contain confidential and/or proprietary information and is intended only for the intended addressee(s) or its authorized agent(s). Any disclosure, printing, copying or use of such information is strictly prohibited. If this email and/or attachment(s) were received in error, please immediately notify the sender and delete all copies [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Anyone successfully install Rgraphviz on windows with R 2.13?
Thanks to everyone who responded. The ReadMe file did the trick. It is too bad that it is so well hidden :) W. Michael Conklin Chief Methodologist Google Voice: (612) 56STATS MarketTools, Inc. | www.markettools.com 6465 Wayzata Blvd | Suite 170 | St. Louis Park, MN 55426. PHONE: 952.417.4719 | CELL: 612.201.8978 This email and attachment(s) may contain confidential and/or proprietary information and is intended only for the intended addressee(s) or its authorized agent(s). Any disclosure, printing, copying or use of such information is strictly prohibited. If this email and/or attachment(s) were received in error, please immediately notify the sender and delete all copies -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Thursday, May 19, 2011 8:51 PM To: Michael Conklin Cc: R-help Subject: Re: [R] Anyone successfully install Rgraphviz on windows with R 2.13? On Thu, May 19, 2011 at 4:28 PM, Michael Conklin michael.conk...@markettools.com wrote: I have been trying to get Rgraphviz to work (I know it is from Bioconductor) unsuccessfully. Since I have no experience with Bioconductor I thought I would ask here if anyone has advice. I have installed Graphviz 2.20.3 as is recommended on the Bioconductor site but basically R cannot seem to find the needed dll files. So, even though I have added the appropriate directories to the system path R cannot seem to find them. Any tips would be appreciated. Be sure to read the installation instructions. Unfortunately they really hid them. You have to download and detar the source package and then look at the README in it. Regarding the path, I have graphviz installed in C:\Program Files\Graphviz2.20 on my Windows Vista system yet it works so at least on Windows I don't think it matters that there are spaces in the path. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Anyone successfully install Rgraphviz on windows with R 2.13?
I have been trying to get Rgraphviz to work (I know it is from Bioconductor) unsuccessfully. Since I have no experience with Bioconductor I thought I would ask here if anyone has advice. I have installed Graphviz 2.20.3 as is recommended on the Bioconductor site but basically R cannot seem to find the needed dll files. So, even though I have added the appropriate directories to the system path R cannot seem to find them. Any tips would be appreciated. W. Michael Conklin Chief Methodologist Google Voice: (612) 56STATS MarketTools, Inc. | www.markettools.comhttp://www.markettools.com 6465 Wayzata Blvd | Suite 170 | St. Louis Park, MN 55426. PHONE: 952.417.4719 | CELL: 612.201.8978 This email and attachment(s) may contain confidential and/or proprietary information and is intended only for the intended addressee(s) or its authorized agent(s). Any disclosure, printing, copying or use of such information is strictly prohibited. If this email and/or attachment(s) were received in error, please immediately notify the sender and delete all copies [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Windows editor suggestions - autosave
I am looking for advice on an editor to use with R (windows) that has an autosave feature. I typically write scripts using the RGui (and tried TinnR yesterday) but I am having continuing problems with BSODs (non R related) and have in the past have had issues with R crashes and would really like a system that does not require me to remember to hit the save button on my script every 10 minutes so that I can avoid redoing everything. W. Michael Conklin Chief Methodologist Google Voice: (612) 56STATS MarketTools, Inc. | www.markettools.com 6465 Wayzata Blvd | Suite 170 | St. Louis Park, MN 55426. PHONE: 952.417.4719 | CELL: 612.201.8978 This email and attachment(s) may contain confidential and/or proprietary information and is intended only for the intended addressee(s) or its authorized agent(s). Any disclosure, printing, copying or use of such information is strictly prohibited. If this email and/or attachment(s) were received in error, please immediately notify the sender and delete all copies __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Analogue to SPSS regression commands ENTER and REMOVE in R?
I bet you stirred the pot here because you arre asking about stepwise procedures. Look at step, or stepAIC in the MASS library. \Mike On Thu, 4 Mar 2010 07:47:34 -0800 Dimitri Liakhovitski ld7...@gmail.com wrote: I am not sure if this question has been asked before - but is there a procedure in R (in lm or glm?) that is equivalent to ENTER and REMOVE regression commands in SPSS? Thanks a lot! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Scraping a web page
I would like to be able to submit a list of URLs of various webpages and extract the content i.e. not the mark-up of those pages. I can find plenty of examples in the XML library of extracting links from pages but I cannot seem to find a way to extract the text. Any help would be greatly appreciated - I will not know the structure of the URLs I would submit in advance. Any suggestions on where to look would be greatly appreciated. Mike W. Michael Conklin Chief Methodologist MarketTools, Inc. | www.markettools.comhttp://www.markettools.com 6465 Wayzata Blvd | Suite 170 | St. Louis Park, MN 55426. PHONE: 952.417.4719 | CELL: 612.201.8978 This email and attachment(s) may contain confidential and/or proprietary information and is intended only for the intended addressee(s) or its authorized agent(s). Any disclosure, printing, copying or use of such information is strictly prohibited. If this email and/or attachment(s) were received in error, please immediately notify the sender and delete all copies [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixed effect multinomial regression
The bayesm package implements such models. Hth, Mike On Tue, 6 Oct 2009 12:41:18 -0700 James Martin just.strut...@gmail.com wrote: Hello list, I was trying to investigate the possible use of a mixed effect multinomial logit model in R. Does anyone have suggestions on where to find information on these models and the associated functions in R. Thanks in advance, jm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with predict.coxph
In examining the predict.coxph functions for the library I have with 2.7.1 versus the library with 2.9.1 I find a major rewrite of the function. A number of internal survival functions are no longer present so much of the code has changed. This makes identifying the specific problem beyond my capabilities. What I want to do, is generate predictions for specific combinations of covariates. The number of combinations I am interested in is different than the number of records in the original data file. Any help would be appreciated as some of the graphic routines I want to use on the data are only available in 2.8 or greater - meaning I am currently looking at trying to run two different versions of R to get the project done. TIA, Michael Conklin -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Conklin Sent: Tuesday, August 18, 2009 8:26 PM To: r-help@r-project.org Subject: [R] Problem with predict.coxph We occasionally utilize the coxph function in the survival library to fit multinomial logit models. (The breslow method produces the same likelihood function as the multinomial logit). We then utilize the predict function to create summary results for various combinations of covariates. For example: mod1-coxph(Depvar~Price:Product+strata(ID),data=MyDCMData2,na.action=na.omit,method=breslow) The model runs fine. Then we create some new data that is all combinations of Price and Product and retrieve the summary linear predictors. newdata=expand.grid(Price=factor(as.character(1:5)),Product=factor(as.character(1:5))) ## create a utility matrix for all combinations of prices and products totalut-predict(mod1,newdata=newdata,type=lp) Under R 2.7.1 this produces the following output: totalut [,1] 1 0.01534582 2 -0.07628528 3 -0.88085189 4 -1.19458045 5 -1.03579684 6 0.40065672 7 0.15922492 8 -0.49233524 9 -0.65483441 10 -1.07739920 11 0.27589201 12 0.48055065 13 0.33638585 14 -0.28416678 15 -0.48762319 16 1.06071986 17 0.69041596 18 0.67479476 19 0.36360168 20 -0.09492167 21 0.66554276 22 0.55748465 23 0.37596413 24 0.01612020 25 -0.03567735 The problem is that under R 2.8.1 and R 2.9.1 the previous line fails with the following error: totalut-predict(mod1,newdata=newdata,type=lp) Error in model.frame.default(Terms2, newdata, xlev = object$xlevels) : variable lengths differ (found for 'Price') In addition: Warning message: 'newdata' had 25 rows but variable(s) found have 43350 rows Does anyone have an idea what is going on? Best regards, Michael Conklin W. Michael Conklin Chief Methodologist MarketTools, Inc. | www.markettools.comhttp://www.markettools.com 6465 Wayzata Blvd | Suite 170 | St. Louis Park, MN 55426. PHONE: 952.417.4719 | CELL: 612.201.8978 This email and attachment(s) may contain confidential and/or proprietary information and is intended only for the intended addressee(s) or its authorized agent(s). Any disclosure, printing, copying or use of such information is strictly prohibited. If this email and/or attachment(s) were received in error, please immediately notify the sender and delete all copies [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with predict.coxph
We occasionally utilize the coxph function in the survival library to fit multinomial logit models. (The breslow method produces the same likelihood function as the multinomial logit). We then utilize the predict function to create summary results for various combinations of covariates. For example: mod1-coxph(Depvar~Price:Product+strata(ID),data=MyDCMData2,na.action=na.omit,method=breslow) The model runs fine. Then we create some new data that is all combinations of Price and Product and retrieve the summary linear predictors. newdata=expand.grid(Price=factor(as.character(1:5)),Product=factor(as.character(1:5))) ## create a utility matrix for all combinations of prices and products totalut-predict(mod1,newdata=newdata,type=lp) Under R 2.7.1 this produces the following output: totalut [,1] 1 0.01534582 2 -0.07628528 3 -0.88085189 4 -1.19458045 5 -1.03579684 6 0.40065672 7 0.15922492 8 -0.49233524 9 -0.65483441 10 -1.07739920 11 0.27589201 12 0.48055065 13 0.33638585 14 -0.28416678 15 -0.48762319 16 1.06071986 17 0.69041596 18 0.67479476 19 0.36360168 20 -0.09492167 21 0.66554276 22 0.55748465 23 0.37596413 24 0.01612020 25 -0.03567735 The problem is that under R 2.8.1 and R 2.9.1 the previous line fails with the following error: totalut-predict(mod1,newdata=newdata,type=lp) Error in model.frame.default(Terms2, newdata, xlev = object$xlevels) : variable lengths differ (found for 'Price') In addition: Warning message: 'newdata' had 25 rows but variable(s) found have 43350 rows Does anyone have an idea what is going on? Best regards, Michael Conklin W. Michael Conklin Chief Methodologist MarketTools, Inc. | www.markettools.comhttp://www.markettools.com 6465 Wayzata Blvd | Suite 170 | St. Louis Park, MN 55426. PHONE: 952.417.4719 | CELL: 612.201.8978 This email and attachment(s) may contain confidential and/or proprietary information and is intended only for the intended addressee(s) or its authorized agent(s). Any disclosure, printing, copying or use of such information is strictly prohibited. If this email and/or attachment(s) were received in error, please immediately notify the sender and delete all copies [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with Random Forest predict
I am trying to run a partialPlot with Random Forest (as I have done many times before). First I run my forest... Cell is a 6 level factor that is the dependent variable - all other variables are predictors, most of these are factors as well. predCell-randomForest(x=tempdata[-match(Cell,names(tempdata))],y=tempdata$Cell,importance=T) Then I try my partial plot to look at the effect of a specific predictor. partialPlot(x=predCell,pred.data=tempdata[-match(Cell,names(tempdata))],x.var=P7_6) I get this error: Error in predict.randomForest(x, x.data, type = prob) : Type of predictors in new data do not match that of the training data. In examining randomForest:::predict.randomForest I see the following code that produces this error message. cat.new - sapply(x, function(x) if (is.factor(x) !is.ordered(x)) length(levels(x)) else 1) if (!all(object$forest$ncat == cat.new)) stop(Type of predictors in new data do not match that of the training data.) } The odd thing is that if I run this code outside of the function: all(predCell$forest$ncat== + sapply(tempdata[-match(Cell,names(tempdata))], function(x) if (is.factor(x) + !is.ordered(x)) + length(levels(x)) + else 1)) [1] TRUE Which should avoid the stop function. Here is the session info. R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] randomForest_4.5-30 Any ideas would be greatly appreciated. W. Michael Conklin Chief Methodologist MarketTools, Inc. | www.markettools.comhttp://www.markettools.com 6465 Wayzata Blvd | Suite 170 | St. Louis Park, MN 55426. PHONE: 952.417.4719 | CELL: 612.201.8978 This email and attachment(s) may contain confidential and/or proprietary information and is intended only for the intended addressee(s) or its authorized agent(s). Any disclosure, printing, copying or use of such information is strictly prohibited. If this email and/or attachment(s) were received in error, please immediately notify the sender and delete all copies [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] performing function on data frame
newDF-as.data.frame(scale(oldDF)) see ?scale Hope that helps. Michael Conklin -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Karin Lagesen Sent: Thursday, April 16, 2009 5:29 AM To: r-help@r-project.org Subject: Re: [R] performing function on data frame David Hajage dhajag...@gmail.com writes: Hi Karin, I'm not sure I understand... Is this what you want ? d$y - mean(d$y)/sd(d$y) Yes, and also a bit no. Each column in my data frame represents one data set. For every element in this data set I want to know the z value for that element. I.e: I want to create a new data frame from the old data frame, where each element in the new data frame is newDF[i,j] = oldDF[i,j] - mean(d[,j]) / sddev(d[,j]) I could, I think, iterate like this over the data frame, but I keep thinking that one of the apply functions should be employed... Karin -- Karin Lagesen, Ph.D. karin.lage...@medisin.uio.no http://folk.uio.no/karinlag __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert bits to numbers in base 10
Alternatively (nn - c(1, 0, 0, 1, 0, 1,0)) [1] 1 0 0 1 0 1 0 sum(2^(0:(length(nn)-1))*nn) but of course it depends if your bits are stored big-endian or little-endian so you might want sum(2^((length(nn)-1):0)*nn) I like Marc's approach better (certainly more elegant). If you have the big vs little endian issue you can just remove the rev from Marc's code below. Michael Conklin -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Marc Schwartz Sent: Thursday, April 09, 2009 4:51 PM To: Jorge Ivan Velez Cc: R-help; Gang Chen Subject: Re: [R] Convert bits to numbers in base 10 I suspect that Gang was looking for something along the lines of: sum(2 ^ (which(as.logical(rev(nn))) - 1)) [1] 74 You might also want to look at the digitsBase() function in Martin's sfsmisc package on CRAN. HTH, Marc Schwartz On Apr 9, 2009, at 4:34 PM, Jorge Ivan Velez wrote: Dear Gang, Try this: nn - c(1, 0, 0, 1, 0, 1,0) paste(nn,sep=,collapse=) See ?paste for more information. HTH, Jorge On Thu, Apr 9, 2009 at 5:23 PM, Gang Chen gangch...@gmail.com wrote: I have some bits stored like the following variable nn (nn - c(1, 0, 0, 1, 0, 1,0)) [1] 1 0 0 1 0 1 0 not in the format of 1001010 and I need to convert them to numbers in base 10. What's an easy way to do it? TIA, Gang __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Winsorizing Multiple Variables
Don't sort y. Calculate xbot and xtop using xtemp-quantile(y,c(tr,1-tr),na.rm=na.rm) xbot-xtemp[1] xtop-xtemp[2] -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Karl Healey Sent: Friday, January 16, 2009 2:51 PM To: r-help@r-project.org Subject: [R] Winsorizing Multiple Variables Hi All, I want to take a matrix (or data frame) and winsorize each variable. So I can, for example, correlate the winsorized variables. The code below will winsorize a single vector, but when applied to several vectors, each ends up sorted independently in ascending order so that a given observation is no longer on the same row for each vector. So I need to winsorize the variable but then return it to its original order. Or another solution that will take a data frame, wisorize each variable, and return a new data frame with all the variables in the original order. Thanks for any help! -Karl #The function I'm working from win-function(x,tr=.2,na.rm=F){ if(na.rm)x-x[!is.na(x)] y-sort(x) n-length(x) ibot-floor(tr*n)+1 itop-length(x)-ibot+1 xbot-y[ibot] xtop-y[itop] y-ifelse(y=xbot,xbot,y) y-ifelse(y=xtop,xtop,y) win-y win } #Produces an example data frame, ss is the observation id, vars 1-5 are the variables I want to winzorise. ss = c (1 : 5 );var1 = rnorm (5 );var2 = rnorm (5 );var3 =rnorm(5);var4=rnorm(5);as.data.frame(cbind(ss,var1,var2,var3,var4))- data data #Winsorizes each variable, but sorts them independently so the observations no longer line up. sapply(data,win) ___ M. Karl Healey Ph.D. Student Department of Psychology University of Toronto Sidney Smith Hall 100 St. George Street Toronto, ON M5S 3G3 k...@psych.utoronto.ca __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] running R-code outside of R
Spencer Graves wrote: If you want to hide the fact that you are using R -- especially if you charge people for your software that uses R clandestinely -- that's a violation of the license (GPL). I doubt if anyone associated with R would bother with a lawsuit, but a competitor who offers related software might. Best Wishes, Spencer Do I understand the implication of the license correctly (forgive my ignorance here). If I analyze a client's data using an R script I created then I can charge the client a $20,000 consulting fee, but, if I let the client push the button to execute the R script and charge him 10 cents for the privilege then I can be sued for violating the GPL? Or are my assumptions on the first part also incorrect and R can only be used for the free betterment of mankind? Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LDA on pre-assigned training and testing data sets
I think this line mafdiscpred - predict(mafdisc, data = test) needs to be mafdiscpred - predict(mafdisc, newdata = test) Michael Conklin Chief Methodologist - Advanced Analytics MarketTools, Inc. 6465 Wayzata Blvd. Suite 170 Minneapolis, MN 55426 Tel: 952.417.4719 | Mobile:612.201.8978 [EMAIL PROTECTED] MarketTools(r)http://www.markettools.com This e-mail and any attachments may contain privileged, confidential or proprietary information. If you are not the intended recipient, be aware that any review, copying, or distribution of this e-mail or any attachment is strictly prohibited. If you have received this e-mail in error, please return it to the sender immediately, and permanently delete the original and any copies from your system. Thank you for your cooperation. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Peter Flom Sent: Wednesday, June 25, 2008 11:22 AM To: r-help@r-project.org Subject: [R] LDA on pre-assigned training and testing data sets Dear r-help I am trying to run LDA on a training data set, and test it on another data set with the same variables. I found examples using crossvalidation, and using training and testing data sets set up with sample, but not when they are preassigned. Here is what I tried # FIRST SET UP A DATAFRAME WITH ALL THE DATA AND CREATE NEW VARIABLES traintest1 - arnaudnognod1[arnaudnognod1$DISC_USE1 == 1.01|arnaudnognod1$DISC_USE1 == 1.03|arnaudnognod1$DISC_USE1 == 1.04 |arnaudnognod1$DISC_USE1 == 1.02|arnaudnognod1$DISC_USE1 == 1.05|arnaudnognod1$DISC_USE1 == 1.06,] traintest1$normal - traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1 == 1.03|traintest1$DISC_USE1 == 1.04 traintest1$mafelev - apply(traintest1[,1:40], 1, FUN = mean) traintest1$mafscatter - apply(traintest1[,1:40], 1, FUN = sd) # NEXT CREATE TRAINING AND TESTING DATAFRAMES train - traintest1[traintest1$DISC_USE1 == 1.01|traintest1$DISC_USE1 == 1.02,] test - traintest1[traintest1$DISC_USE1 1.02,] # NOW, TRAIN HAS 400 ROWS, TEST HAS 396 ROWS, AND TRAINTEST1 HAS 796 ROWS, EACH HAS 615 COLUMNS, AS EXPECTED # RUN DISCRIM ON TRAINING DATA mafdisc - lda(normal~mafelev + mafscatter, data = train) #mafdisc$counts IS 210 AND 190, AS EXPECTED #FINALLY, TEST IT ON THE TEST DATA mafdiscpred - predict(mafdisc, data = test) #BUT mafdiscpred$class HAS LENGTH = 400, NOT 396, AS EXPECTED. any help appreciated thanks Peter Peter L. Flom, PhD Brainscope, Inc. 212 263 7863 (MTW) 212 845 4485 (Th) 917 488 7176 (F) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] request: a class having max frequency
The 0 is the name of the item and the 1 is the index in f of the maximum class. (since f is a table, and the first element of the table is the maximum, which.max returns a 1) So, if you just want to know which class is maximum you can say names(which.max(f)) Michael Conklin Chief Methodologist - Advanced Analytics MarketTools, Inc. 6465 Wayzata Blvd. Suite 170 Minneapolis, MN 55426 Tel: 952.417.4719 | Mobile:612.201.8978 [EMAIL PROTECTED] MarketTools(r)http://www.markettools.com This e-mail and any attachments may contain privileged, confidential or proprietary information. If you are not the intended recipient, be aware that any review, copying, or distribution of this e-mail or any attachment is strictly prohibited. If you have received this e-mail in error, please return it to the sender immediately, and permanently delete the original and any copies from your system. Thank you for your cooperation. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Muhammad Azam Sent: Friday, June 06, 2008 8:15 AM To: R Help; R-help request Subject: [R] request: a class having max frequency Dear R users I have a very basic question. I tried but could not find the required result. using dat - pima f - table(dat[,9]) f 0 1 500 268 i want to find that class say 0 having maximum frequency i.e 500. I used which.max(f) which provide 0 1 How can i get only the 0. Thanks and best regards Muhammad Azam Ph.D. Student Department of Medical Statistics, Informatics and Health Economics University of Innsbruck, Austria [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Percentages for categorical data by group
tapply(example.data$responseVar,example.data$groupVar,function(x){prop.t able(table(x))}) Michael Conklin Chief Methodologist - Advanced Analytics MarketTools, Inc. 6465 Wayzata Blvd. Suite 170 Minneapolis, MN 55426 Tel: 952.417.4719 | Mobile:612.201.8978 [EMAIL PROTECTED] MarketTools(r)http://www.markettools.com This e-mail and any attachments may contain privileged, confidential or proprietary information. If you are not the intended recipient, be aware that any review, copying, or distribution of this e-mail or any attachment is strictly prohibited. If you have received this e-mail in error, please return it to the sender immediately, and permanently delete the original and any copies from your system. Thank you for your cooperation. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Economics Guy Sent: Friday, May 23, 2008 9:52 AM To: [EMAIL PROTECTED] Subject: [R] Percentages for categorical data by group I can think of several ways to blunt force hard code what I want but I imagine there is a command or two that can be easily combined to do this: I have a data frame with about 23000 observations. There first variable is the group to which the observation belongs (about 500 different groups). The second variable is a response for each observation that is a 1,2,3,4 or 5. I want to be able to calculate the percentage of each group that choose each response. For example I want to know what percentage of group 1 (which may have a value of 34456) choose response 1 and so on. Here is some code I wrote that generates a data frame like the one I have. pop - matrix(1:10) groupIDs - sample(pop,500) groupVar - sample(groupIDs,23000,replace=TRUE) responseVar - sample(1:5,23000,replace=TRUE) example.data - data.frame(groupVar,responseVar) Is there a fast way to calculate these percentages beyond writing loops to manually count the responses for each of the groups? Thanks, EG [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Percentages for categorical data by group
prop.table(table(factor(x,levels=1:5))) Michael Conklin Chief Methodologist - Advanced Analytics MarketTools, Inc. 6465 Wayzata Blvd. Suite 170 Minneapolis, MN 55426 Tel: 952.417.4719 | Mobile:612.201.8978 [EMAIL PROTECTED] MarketTools(r)http://www.markettools.com This e-mail and any attachments may contain privileged, confidential or proprietary information. If you are not the intended recipient, be aware that any review, copying, or distribution of this e-mail or any attachment is strictly prohibited. If you have received this e-mail in error, please return it to the sender immediately, and permanently delete the original and any copies from your system. Thank you for your cooperation. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Economics Guy Sent: Friday, May 23, 2008 1:36 PM To: [EMAIL PROTECTED] Subject: Re: [R] Percentages for categorical data by group I appreciate all the help. The trouble is that in my real data set each group does not always have an observation that choose each response. This results in some of the rows returned from prop.table() to be shorter than others so I get: Warning message: In function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 8) Is there a way to tell rbind() or do.call() to treat missing values as zero or make prop.table() include the zero proportions? On Fri, May 23, 2008 at 1:59 PM, Phil Spector [EMAIL PROTECTED] wrote: EG - Thanks for the reproducible example! When I run your code, and check the class of the result from tapply(), I see that it is an array, and using dim(), I see it's an array of length 500. How big is each element? table(sapply(res,length)) 5 500 So each piece is the same length. That means we could make a 500x5 matrix as follows: do.call(rbind,res) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with R version 2.6.0
On Fri, 9 Nov 2007, Prof Brian Ripley wrote: This is of course not how the rw-FAQ suggests you make use of R, and the best recommendation is to follow the FAQ's workflow. The workflow recommendation that I read in the FAQ is: 2.5 How do I run it? Just double-click on the shortcut you prepared at installation. If you want to set up another project, make a new shortcut or use the existing one and change the `Start in' field of the Properties. --- I am wondering why this is the best workflow. I work on several hundred projects per year (R is definitely a production vehicle for us) and can have as many as 20 going at the same time. Having a single shortcut and changing the working directory on startup to the appropriate folder (as opposed to changing the Start in property on the shortcut) is much more efficient for me than creating a new shortcut for a specific project. The beauty of R is that there are multiple ways to do many things and the user can find the way that is best for him. Michael Conklin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.