Re: [R] How do I know where is R? - VB Programming
Duncan Murdoch wrote: On 14/07/2009 7:22 PM, Haoda Fu wrote: Dear all - Is there anyone know how to let VB or C# know where I install R automatically(i.e. auto detect R directory)? On Windows if you run the installer it will record its location in the registry, under *\Software\R-core\R\, where * is HKLM or HKCU, depending on what permissions the user had who installed it. But some users don't run the installer (maybe because someone else installed it on a network, or they built it themselves) so that isn't completely reliable. On Unix/Linux we have $ R RHOME /usr/lib/R but same thing doesn't work with Rterm on Windows AFAICS. I wonder if it shouldn't? -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlation question (from a newbie)
Dear R-users please ignore my most recent posting.. Found the solution.. Thanks to David Winsemius.. Thanks, Santosh On Tue, Jul 14, 2009 at 9:14 PM, Santosh santosh2...@gmail.com wrote: Dear R-users.. I hope the following scenario is more explanatory of my question.. Continuous variables: AGE, WEIGHT, HEIGHT Categorical variables: Group, Sex, Race I would like to find a correlation between WEIGHT and AGE, grouped by Group,Sex, and Race. Is the following formula correct? tapply(dat$WEIGHT, by=list(dat$AGE,as.factor(dat$Group),as.factor(dat$EX),as.factor(dat$RACE)),cor) Thanks, Santosh On Tue, Jul 14, 2009 at 7:34 PM, Santosh santosh2...@gmail.com wrote: Hi R-users, Was wondering if there is a way to quickly compute correlations between continuous variables grouped by some categorical variables? What function do I use? Thanks much in advance. Regards, Santosh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (simple) xml into data.frame and reverse
Duncan Temple Lang wrote: I wrote some relatively general functions, but hastily written functions to read this sort of data. You can find them attached or at http://www.omegahat.org/RSXML/xmlToDataFrame.xml Looks like that's the wrong link. I also did not find it mentioned in the parent page. Dieter -- View this message in context: http://www.nabble.com/%28simple%29-xml-into-data.frame-and-reverse-tp24488291p24492792.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping data in dataframe
Timo Schneider wrote: I have a dataframe (obtained from read.table()) which looks like ExpA ExpB ExpC Size 1 12 2333 1 2 12 2429 1 3 10 2234 1 4 25 5060 2 5 24 5362 2 6 21 4961 2 now I want to take all rows that have the same value in the Size column and apply a function to the columns of these rows (for example median()). The result should be a new dataframe with the medians of the groups, like this: Besides the mentioned aggregate, you could use one of the functions in package plyr. Dieter -- View this message in context: http://www.nabble.com/Grouping-data-in-dataframe-tp24491539p24492807.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SOS! error in GLM logistic regression...
On Tue, 2009-07-14 at 15:57 -0700, Michael wrote: Hi all, Could anybody tell me what happened to my logistic regression in R? mylog=glm(mytraindata$V1 ~ ., data=mytraindata, family=binomial(logit)) It generated the following error message: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor 'state1' has new level(s) AP Hi Michael, I am 99.9% certain that what you claim above is completely false. That error looks to arise from a call in predict.lm which is called when you use predict.glm The big give away for me was 'newdata' in the call that produced the error. That suggested to me you were using a predict method. I then tracked down a potential source of the error by looking at what happens when you call predict on a glm object. Members of the list shouldn't have to do this to help answer posts to the list. The posting guide asks you to provide, minimal, self-contained, reproducible examples. And failing that, the R code you actually used. After the chastisement ;-) some help: 1) I'm guessing, but in the dataset that you predict for, does the state1 variable have more/different levels to the training data? Try levels(mytraindata) and levels(mytestdata) where mytestdata is the data you supplied as 'newdata' in your call to predict. 2) Your glm call contains some redundancy. It can be simplified to: mylog - glm(V1 ~ ., data = mytraindata, family = binomial) Realising that the logit link is the default in the binomial family, and if you specify data you don't need to refer to the data object in the formula. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] The greatest common divisor between more than two integers
Hi, Do somebody know if there is a function in R which computes the greatest common divisor between several (more than two) integers? Best, Atte Atte Tenkanen University of Turku, Finland Department of Musicology +35823335278 http://users.utu.fi/attenka/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The greatest common divisor between more than two integers
On Wed, Jul 15, 2009 at 8:55 AM, Atte Tenkanenatte...@utu.fi wrote: Do somebody know if there is a function in R which computes the greatest common divisor between several (more than two) integers? Is there a function for computing the greatest common divisor of *two* numbers? I can't find one, but assume that there is such a function, and call it gcd. Then you could define a recursive function to do the job. Something like new_gcd = function(v) { if (length(v)==2) return(gcd(v)) else return (new_gcd(v[1],new_gcd(v[2:length(v)])) } where v is a vector containing the numbers you want to calculate the greatest common divisor of. -- Michael Knudsen micknud...@gmail.com http://lifeofknudsen.blogspot.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to deploy statistical models built in R in real-time?
I am framing this as a question since I would like to know how folks are currently deploying the models they build in R. Say, you want to use the results of your model inside another application in real-time or on-demand, how do you do it? How do you use the decisions you get back from your models? Late answer, sorry. I love PMML (and have been advocating it since at least version 2.0) but I rarely see it deployed in commercial companies. What I see in decreasing order of importance: 1. Pre-scoring. That is pre-calculate the scores of your model for each customer and stuff them into a database that your operational system can access. Example: Customer churn in mobile telco. 2. Convert the model to SQL. This is obviously easier for some model types (trees, k-nearest neighbour, ...) than others. This is surprisingly common. Example: A Big Famous Data Insights Company created a global customer segmentation model (really: 'cause all markets and cultures are the same) for a multi-national company and distributed it as a Word document with pseudo-SQL fragments for each country to implement. Gets over the problem of different technologies in different countries. 3. Pre-scoring for multiple likely events. Example: For cross- and up-sell in a call centre (which is phenomenally effective) you really want to include the outcome of the original call as an input to the propensity model. A badly handled complaint call does not offer the same opportunities for flogging more products as a successful upgrade to a higher price plan (but might be an opportunity to extend an (expensive) retention offer). The Right Way to do this is to run the model in real time which would usually mean PMML if you have created the model in R. At least one vendor recommended âjustâ pre-scoring the model for each possible (relevant) call outcome and storing that in the operational database. That vendor also sold databases :-) 4. Use PL/R to embed R within your (PostgreSQL) RDBMS. (Rare.) 5. Embed R within your operational application and run the model that way (I have done this exactly once). Somewhere between 1 and 2 is an approach that doesnât really fit with the way you framed the question (and is probably OT for this list). It is simply this: if you want to deploy models for real-time or fast on-demand usage, usually you donât implement them in R (or SAS or SPSS). In Marketing, which is my main area, there are dedicated tools for real-time decisioning and marketing like Oracle RTD [1], Unica inbound marketing [2], Chordiant Recommendation Advisor, and others [3], though only the first of these can realistically be described as âmodellingâ. Happy to discuss this more offline if you want. And I really like your approach - hope to actually use it some day. Allan. More at http://www.pcagroup.co.uk/ and http://www.cybaea.net/Blogs/Data/ [1] http://www.oracle.com/appserver/business-intelligence/real-time-decisions.html [2] http://www.unica.com/products/Inbound_Marketing.htm [web site down at time of writing] [3] E.piphany and SPSS Interaction Builder appears to be nearly dead in the market. On 08/07/09 23:38, Guazzelli, Alex wrote: I am framing this as a question since I would like to know how folks are currently deploying the models they build in R. Say, you want to use the results of your model inside another application in real-time or on-demand, how do you do it? How do you use the decisions you get back from your models? As you may know, a PMML package is available for R that allows for many mining model to be exported into the Predictive Model Markup Language. PMML is the standard way to represent models and can be exported from most statistical packages (including SPSS, SAS, KNIME, ...). Once your model is represented as a PMML file, it can easily be moved around. PMML allows for true interoperability. We have recently published an article about PMML on The R Journal. It basically describes the PMML language and the package itself. If you are interested in finding out more about PMML and how to benefit from this standard, please check the link below. http://journal.r-project.org/2009-1/RJournal_2009-1_Guazzelli+et+al.pdf We have also wrote a paper about open standards and cloud computing for the SIGKDD Explorations newsletter. In this paper, we describe the ADAPA Scoring Engine which executes PMML models and is available as a service on the Amazon Cloud. ADAPA can be used to deploy R models in real-time from anywhere in the world. I believe it represents a revolution in data mining since it allows for anyone that uses R to make effective use of predictive models at a cost of less than $1/hour. http://www.zementis.com/docs/SIGKDD_ADAPA.pdf Thanks! Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
[R] Starting RCmdr from the Commandline (and a shortcut) in Windows
I thought I'd share the experience I had in getting Rcmdr to start automatically when R starts up. I'd read of solutions that would start Rcmdr every time R started up, but I didn't want it to start every time, only when I wanted it to. First, I created a shortcut on the desktop to R and changed its name to Rcmdr. Next, I changed the target of the shortcut to (edit the version number if needed): C:\Program Files\R\R-2.9.1\bin\Rgui.exe --sdi RCMDR=TRUE I added the options --sdi (I like this interface the best) and created the RCMDR=TRUE environmental variable. Next, I edited the Rprofile.site file in the etc directory of the R installation and added the following lines to the bottom: defpack = getOption(defaultPackages) mylibs = c(lattice,MASS,Matrix) if(Sys.getenv(RCMDR) == TRUE) mylibs = c(mylibs,Rcmdr) options(defaultPackages = c(defpack,mylibs)) Note that mylibs is the list of libraries that I wanted autoloaded. Although Matrix will autoload lattice, I didn't want it to announce it every time R started, so I loaded it first. Now clicking on Rcmdr on the desktop starts up Rcmdr (in addition to R), and starting up R regularly does not start Rcmdr, which is what I wanted! I hope someone else may find this useful! -Scott * Scott K. Hyde Assistant Professor of Statistics and Mathematics College of Math and Sciences Brigham Young University -- Hawaii Laie, HI 96762 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I know where is R? - VB Programming
This works on Linux (at least under bash shell) and also on Windows Vista and presumably earlier versions of Windows as well: R --vanilla --slave -e R.home() but the R part at the beginning does have the problem that it only works if R is on your path. If, on Windows, you normally double click the R icon on your desktop or use the R utilities in the batchfiles distribution there would be no need to have R on your path. The batchfiles utilities use the registry, and failing that heuristics, to locate R. On Wed, Jul 15, 2009 at 2:13 AM, Peter Dalgaardp.dalga...@biostat.ku.dk wrote: Duncan Murdoch wrote: On 14/07/2009 7:22 PM, Haoda Fu wrote: Dear all - Is there anyone know how to let VB or C# know where I install R automatically(i.e. auto detect R directory)? On Windows if you run the installer it will record its location in the registry, under *\Software\R-core\R\, where * is HKLM or HKCU, depending on what permissions the user had who installed it. But some users don't run the installer (maybe because someone else installed it on a network, or they built it themselves) so that isn't completely reliable. On Unix/Linux we have $ R RHOME /usr/lib/R but same thing doesn't work with Rterm on Windows AFAICS. I wonder if it shouldn't? -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] loading multiple .Rdata and preserving variable names
Dear R-users, I need to load outputs from multiple previous calculations into one R session for comparison and (cross-)analysis. The previous calculations are stored in different directories in .Rdata files (I don't know if this is the best storage for later usage, but the previous project could be recalculated and saved into different format, if needed, potentially.) I can load consequently multiple .Rdata files (in which variable names are identical), but I don't know how to preserve variables from these files or make them unique. I would imagine pointing them as in a list: # loading 2, max. 4 outputs of previous calculations load(DataSet1) # VariableA is present load(DataSet2) # VariableA is present, too # both VraiableA listed and present DataSet1$VariableA$parameters DataSet2$VariableA$parameters But what is the way to feed all variables into a list? Or more generally, what is an efficient way to work with multiple separate outputs which one would like to compare? (I have read something about environments, but I understood it's only for functions. I could create environments, but was not succesful in using them at all (location change or separate sandbox). Functions saveCache, loadCache, saveObject, loadObject is not really what I have in mind, too -- saveObject(list=ls(), NewObjectFile) is not a solution either...) Thanks for any hint in advance. Cheers, Zroutik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Averaging dataframes that are stored in a list
Hi, you could try the reshape package, l = list(d1= data.frame(ID=letters[1:4],value=1:4), d2= data.frame(ID=letters[1:4],value= -c(1:4))) library(reshape) m = melt(l) cast(m, .~ID, fun=mean) HTH, baptiste 2009/7/15 Mark Na mtb...@gmail.com Dear R-helpers, I have a list containing 5000 elements, each element is a dataframe containing one ID column (identical over the 5000 dataframes) and 9 numeric variables, e.g. ID VAR1 VAR2 VAR3 ... VAR9 I would like to create a new dataframe containing the ID column and the mean values of the 9 numeric variables. So, the structure of this new dataframe would be identical to the structure of the dataframes stored in my list (and the ID column would also be identical) but the values would be mean values. I've been attempting to do this with rowMeans and subscripting the list using double square brackets, but I can't get it to work. I'd appreciate any pointers to get me going, thanks! Mark Na [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- _ Baptiste Auguié School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loading multiple .Rdata and preserving variable names
On 15/07/2009 5:00 AM, Žroutík wrote: Dear R-users, I need to load outputs from multiple previous calculations into one R session for comparison and (cross-)analysis. The previous calculations are stored in different directories in .Rdata files (I don't know if this is the best storage for later usage, but the previous project could be recalculated and saved into different format, if needed, potentially.) I can load consequently multiple .Rdata files (in which variable names are identical), but I don't know how to preserve variables from these files or make them unique. I would imagine pointing them as in a list: # loading 2, max. 4 outputs of previous calculations load(DataSet1) # VariableA is present load(DataSet2) # VariableA is present, too # both VraiableA listed and present DataSet1$VariableA$parameters DataSet2$VariableA$parameters But what is the way to feed all variables into a list? Or more generally, what is an efficient way to work with multiple separate outputs which one would like to compare? (I have read something about environments, but I understood it's only for functions. I could create environments, but was not succesful in using them at all (location change or separate sandbox). Functions saveCache, loadCache, saveObject, loadObject is not really what I have in mind, too -- saveObject(list=ls(), NewObjectFile) is not a solution either...) Environments are basic in R. If they are only for functions, it's because everything in R is done by functions. Given two .RData files, you can load them into separate environments. Dealing with conflicting variables names is another issue, of course. For example: DataSet1 - new.env() load( somefile, envir=DataSet1) Now use ls(DataSet1) to see what you've got, repeat for DataSet2, etc. You can get a variable as DataSet1$VariableA. Duncan Murdoch Thanks for any hint in advance. Cheers, Zroutik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loading multiple .Rdata and preserving variable names
On Wed, Jul 15, 2009 at 10:41 AM, Duncan Murdochmurd...@stats.uwo.ca wrote: Given two .RData files, you can load them into separate environments. Dealing with conflicting variables names is another issue, of course. For example: DataSet1 - new.env() load( somefile, envir=DataSet1) Now use ls(DataSet1) to see what you've got, repeat for DataSet2, etc. You can get a variable as DataSet1$VariableA. As an alternative to loading you can attach the RData and get variables using get. Create a couple of RData files with an 'x' in them: x=1 save(x,file=x1.RData) x=2 save(x,file=x2.RData) q() Now in a new R session: attach(x1.RData,warn=FALSE) attach(x2.RData,warn=FALSE) get(x,pos=file:x1.RData) [1] 1 get(x,pos=file:x2.RData) [1] 2 I've used warn=FALSE to suppress the warnings that attach() gives you when objects attached mask other objects of the same name. You can also detach() these things once you're done with them. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (newbie) sum for certain number of rows
I have following data in a data.csv file separated by space 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 etc... I wish to calculate the sum of each column for certain number of rows. For example if I want sum of the data after each 3 rows, it should display 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 2 3 2 3 3 2 3 2 So far, this is what I have done xx-read.table(data.csv,header=FALSE) ss-t(apply(xx,2,sum)) # which displayed the sum of all rows I tried my best to look for solution on the Internet but so far haven't managed to find it. I am extremely grateful if someone can point me how to go about it. Thanks. Kelvin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problems in resampling time series, block length, trend.test
Hi, I have a time series (say x) of 13 years showing an evident increase. I want to exclude two observations (the fourth and 10th), so I do: trend.test(x[-c(4,10)]) where: x[-c(4,10)] [1]7 37 79 72 197 385 636 705 700 1500 1900 and I get: Spearman's rank correlation rho data: x[-c(4, 10)] and time(x[-c(4, 10)]) S = 4, p-value 2.2e-16 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.9818182 Very strong positive correlation! the x is increasing Now, I would like to resample this time series because I have others time series where the trend is more uncertain and the sample size is still small. I read Resampling Methods in R: The boot Package (ISSN 1609-3631) and I believe that a way of doing it is by a block bootstrap that allows to deal with the autocorrelation (I know that some of my time series show autocorrelation, lag=1). I used trend.test function (library=pastecs) and I did: trend.x=trend.test(x[-c(4,10)],R=999) trend.x trend.x$p.value plot(trend.x) And I get: trend.x=trend.test(x[-c(4,10)],R=999) trend.x BLOCK BOOTSTRAP FOR TIME SERIES Fixed Block Length of 1 Call: tsboot(tseries = x, statistic = test.trend, R = R, l = 1, sim = fixed) Bootstrap Statistics : original biasstd. error t1* 0.9818182 -0.9844001 0.3272114 trend.x$p.value [1] 0 I suppose that the problem arises from the length of the block (1) and in this way I get a rho=0, (nevertheless I don't understand how it is significant). I would like to change the block length but I am not able to (I tried in several different ways but unsuccessfully). So, two questions: 1) How can I change the block length? 2) In terms of block length and type of simulation (sim=fixed or geom), what is the best way of doing it? Thanks in advance for any suggestion, Best wishes Simone _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (newbie) sum for certain number of rows
one way is the following: dat - read.table(textConnection( 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 )) closeAllConnections() k - 3 ind - rep(seq(1, nrow(dat)/k), each = k) rowsum(dat, ind) I hope it helps. Best, Dimitris kelvin lau wrote: I have following data in a data.csv file separated by space 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 etc... I wish to calculate the sum of each column for certain number of rows. For example if I want sum of the data after each 3 rows, it should display 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 2 3 2 3 3 2 3 2 So far, this is what I have done xx-read.table(data.csv,header=FALSE) ss-t(apply(xx,2,sum)) # which displayed the sum of all rows I tried my best to look for solution on the Internet but so far haven't managed to find it. I am extremely grateful if someone can point me how to go about it. Thanks. Kelvin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] interacting variables in a formulae in R
Dear R Mailing Subscribers, I just have a question of how R handles interacting variables in model creation using glm for instance. if I write : model=glm(solution~descriptor1+descriptor2+descriptor3,family=binomial(link=logit)) I should end up with coefficients for a logistic model that I can introduce easily in the mathematical form of such a model (weighted sum of these descriptors). But how do I have to interpret what R does with interacting terms, like this? : model=glm(solution~descriptor1:descriptor2+descriptor3,family=binomial(link=logit)) What does R with the numerical values of descriptor1 and 2 in order to fit the data? How I can rewrite the model in a mathematical form? Thanks in advance. Best regards. -- Peter Schmidtke -- PhD Student at the Molecular Modeling and Bioinformatics Group Dep. Physical Chemistry Faculty of Pharmacy University of Barcelona __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting confidence intervals
Erin Hodgess wrote: Hi R People: If I have a fitted values from a model, how do I plot the (1-alpha)100% confidence intervals along with the fitted values, please? Also, if the intervals are shaded gray, that would be nice too, please? I check confint, but that doesn't seem to do what I want. Hi Erin, Probably the simplest way is to plot points with error bars as in the following example: coefficients-rnorm(5)+4 plot(coefficients,main=Coefficient plot,xaxt=n,ylim=c(0,6),xlab=Coefficients) axis(1,at=1:5,labels=paste(Coef,1:5,sep=)) dispersion(1:5,coefficients,abs(rnorm(5)),col=lightgray) The dispersion function in the plotrix package is just one way to illustrate confidence intervals. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: geom_errorbarh()
Hi, I expect 'width' to set the width of the horizontal line of the whisker, as it does with errobar() Regards/Cordialement Benoit Boulinguiez -Message d'origine- De : hadley wickham [mailto:h.wick...@gmail.com] Envoyé : dimanche 12 juillet 2009 11:04 À : Benoit Boulinguiez Cc : r-help@r-project.org Objet : Re: [R] ggplot2: geom_errorbarh() Hi Benoit, What do you expect width to do? You are already setting the left and right extents with xmin and xmax. Hadley On Thu, Jul 9, 2009 at 10:37 AM, Benoit Boulinguiezbenoit.boulingu...@ensc-rennes.fr wrote: Hi all, quick question: is the optional command width effective in the geom_errorbarh() layer of ggplot? Cause I can't get it works on this graph http://www.4shared.com/file/116919103/93488d88/iso_2PrsH.html pdf(file = iso_2PrsH.pdf, width = 7, height = 7) NC60.iso.graph-ggplot( NC60.DATA ,aes(Ce,Qe)) + geom_point(col=MaCouleur1, size=4) + geom_errorbar( aes(ymax = NC60.DATA$Qe+NC60.DATA$sdQe ,ymin=NC60.DATA$Qe-NC60.DATA$sdQe) ,colour=alpha(black,0.4) ,width=1) + geom_errorbarh( aes(xmax = NC60.DATA$Ce+NC60.DATA$sdCe ,xmin=NC60.DATA$Ce-NC60.DATA$sdCe) ,colour=alpha(black,0.4) ,width=1) + geom_line(data=NC60.Res4.curve ,aes(x,y) ,size=1 ,colour=alpha(black,0.5)) + xlab(C[e]~(mmol/m^3)) + ylab(q[e]~(mmol/m^3)) print(NC60.iso.graph) dev.off() Regards/Cordialement - Benoit Boulinguiez Ph.D student Ecole de Chimie de Rennes (ENSCR) Bureau 1.20 Equipe CIP UMR CNRS 6226 Sciences Chimiques de Rennes Avenue du Général Leclerc CS 50837 35708 Rennes CEDEX 7 Tel 33 (0)2 23 23 80 83 Fax 33 (0)2 23 23 81 20 http://www.ensc-rennes.fr/ http://www.ensc-rennes.fr/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] end of daylight saving time
Hi, I read from the help on DateTimeClasses and various posts on this list that, quite logically, one needs to specify if DST is in function for the first hour after the change from DST to ST in autumn. Hence, in my time zone and on Mac OS X, I can do this: a - as.POSIXct(2008-11-02 01:30:00, tz=EST5EDT) # to get automatic use of DST b - as.POSIXct(2008-11-02 01:30:00, tz=EST) # to tell T this is the second occurrence of 1:30 that day, in ST difftime(b,a) But why can't I do this, to handle several date-times at once? c - rep(2008-11-02 01:30:00, 2) tzone = c(EST5EDT, EST) as.POSIXct(c, tz=tzone) Erreur dans strptime(xx, f - %Y-%m-%d %H:%M:%OS, tz = tz) : valeur 'tz' incorrecte ??? Thanks, Denis Chabot sessionInfo() R version 2.9.1 Patched (2009-07-09 r48929) x86_64-apple-darwin9.7.0 locale: fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.9.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (newbie) sum for certain number of rows
In the zoo package, rollapply can do that. library(zoo) out - rollapply(ts(dat), width = 3, by = 3, sum) out Time Series: Start = 3 End = 9 Frequency = 0.333 V1 V2 V3 V4 V5 V6 V7 V8 3 1 0 1 1 1 1 1 1 6 1 1 1 1 0 0 1 1 9 2 3 2 3 3 2 3 2 This gives a ts class time series result so if you need a data frame result just convert it back: out - as.data.frame(out) See ?rollapply for more. On Wed, Jul 15, 2009 at 6:03 AM, kelvin laukelvin...@yahoo.com wrote: I have following data in a data.csv file separated by space 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 etc... I wish to calculate the sum of each column for certain number of rows. For example if I want sum of the data after each 3 rows, it should display 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 2 3 2 3 3 2 3 2 So far, this is what I have done xx-read.table(data.csv,header=FALSE) ss-t(apply(xx,2,sum)) # which displayed the sum of all rows I tried my best to look for solution on the Internet but so far haven't managed to find it. I am extremely grateful if someone can point me how to go about it. Thanks. Kelvin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Quotes - useFancyQuotes
I need to get the exact sentence, with the quotes: OPTIONS COORDSYS(Surface1 AS COMPONENT); but this: options(useFancyQuotes = F) paste(OPTIONS COORDSYS(, dQuote(Surface 1), AS COMPONENT); ) give me this: [1] OPTIONS COORDSYS( \Surface 1\ AS COMPONENT); And it's not readable as SpatialSQL code by GIS How to deal with this problem? Paulo E. Cardoso Paulo E. Cardoso [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error when sampling from SpatialLines
Which version of the sp package are you using? In a version that is submitted to CRAN, and available for CVS checkout from the R-spatial sourceforge repository, a weakness was addressed when a Line object received no sample points, either because of its shortness, or because n is small for the total length of points. I suggest that you try the latest version of sp from the CVS repository. Roger Bivand PS. The traffic that you mention about sampling on lines was on the R-sig-geo list, not this list, so a post there might have been a better idea. PPS. Always state package version numbers when reporting problems, for example by giving sessionInfo() output. For errors, always also give the output of traceback(), or al least its first lines, which may show where the problem arises. Elke Moons wrote: Dear forum, I am working in R 2.9.1 and I am trying to sample locations from a network file. Reading in nor plotting is a problem, however when I am trying to sample from the file I get the following message: nwlim-readShapeLines(C:/Limburg_nwshape, proj4string=CRS(+init=epsg:31300)) plot(nwlim) randacc-spsample(nwlim,n=1000,random) Error in function (classes, fdef, mtable) : unable to find an inherited method for function coordinates, for signature numeric I found a similar error message in the archives, where prof. dr. Bivand indicated to try to use readOGR instead of readShapeLines, however, I get the same error message. nwlim2-readOGR(C:/Limburg_nwshape.shp, Limburg_nwshape) OGR data source with driver: ESRI Shapefile Source: C:/Limburg_nwshape.shp, layer: Limburg_nwshape with 60745 rows and 24 columns Feature type: wkbLineString with 2 dimensions randacc-spsample(nwlim2,n=1000,random) Error in function (classes, fdef, mtable) : unable to find an inherited method for function coordinates, for signature numeric Can anyone help me with this issue? The shape file in itself is not so large (only a province was taken instead of the entire file of Belgium because of memory issues), only 8.11Mb. Thanks in advance! Elke __ Elke Moons, PhD Instituut voor Mobiliteit/Transportation Research Institute Universiteit Hasselt Wetenschapspark 5/Lokaal 1.10 3590 Diepenbeek BELGIUM Tel. +32-11-26.91.26 Fax +32-11-26.91.99 E-mail: elke.mo...@uhasselt.be [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Error-when-sampling-from-SpatialLines-tp24481180p24496005.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix multiplication precision
On Wed, Jul 15, 2009 at 3:42 AM, Nair, Murlidharan Tmn...@iusb.edu wrote: Hi!! I am trying to multiply 5 matrices and then using the inverse of that matrix for further computation. I read about precision problems from the archives and the suggestion was to use as.numeric while computing the products. Hmm. I'm not sure what the origins of that advice might be but I don't think it would apply in general. I am still having problems with the results. Here is how I am using it #Mn.mat-(T.mat %*% Rz.mat %*% Q.mat %*% Rz.mat %*% T.mat) # I was doing this in one step which I have broken down into multiple steps as below. Mn1.mat-matrix(as.numeric(T.mat %*% Rz.mat),nrow=4) Mn2.mat-matrix(as.numeric(Mn1.mat %*% Q.mat),nrow=4) Mn3.mat-matrix(as.numeric(Mn2.mat %*% Rz.mat),nrow=4) Mn.mat-matrix(as.numeric( Mn3.mat %*% T.mat),nrow=4) I doubt very much that the as.numeric would have any effect on precision in these cases. You are simply taking a numeric result in its matrix form, converting it to a vector then converting the vector back to a matrix., mm - matrix(round(rnorm(8), 3), nrow = 4) mm [,1] [,2] [1,] -0.110 2.195 [2,] 0.616 0.353 [3,] 0.589 0.970 [4,] 1.028 0.857 as.numeric(mm) [1] -0.110 0.616 0.589 1.028 2.195 0.353 0.970 0.857 The key in situations like this is to analyze the structure of the computation and decide whether you can group the intermediate results. You have written your product as T %*% Rz %*% Q %*% Rz %*% T What are the characteristics of T, Rz, and Q? Are they square and symmetric, square and triangular, diagonal? Is Q orthogonal (the factors in an orthogonal - triangular decomposition are often written as Q and R)? Did you happen to skip a few transposes in that formula - often formulas like that generate symmetric matrices. And the big lesson, of course, is the first rule of numerical linear algebra., Just because a formula is written in terms of the inverse of a matrix doesn't mean that is a good way to calculate the result; in fact, it is almost inevitably the worst way of calculating the result. You only calculate the inverse of a matrix after you have investigated all other possible avenues for calculating the result and found them to be fruitless. You haven't told us what you plan to do with the inverse and that is an important consideration., If, for example, it represents a variance-covariance matrix, and you want to express the result as standard deviations and correlations you would not compute the variance-covariance matrix but stop instead at the Cholesky factor of the inverse. You may want to check the facilities of the Matrix package for expressing your calculation. It is a recommended package that is included with binary versions of R from 2.9.0 and has many ways of expressing and exploiting structure in matrices. For getting the inverse I am doing the following Mn.mat.inv-qr.solve(Mn.mat) Will I run into precision problems when I do the above? Thanks ../Murli __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re ad PNG file and display with more than 256 colors RGDAL
If you read the help page for the image method you are actually using: ?image.SpatialGridDataFrame you see that it supports red=, green=, and blue= arguments. This suggests that you could try: myTile - readGDAL(basemap.png,silent=TRUE) class(myTile) summary(myTile) # assuming the positions are 1, 2, 3 image(myTile, red=1, green=2, blue=3) On the other hand, the colours in these images are arbitrary manipulations of the sensors, and have already been warped and resampled, so decimating the colour representation isn't inappropriate, and 256 is not a small number. Roger Bivand PS. Detailed rgdal questions like this may get a faster response on R-sig-geo Paul.Rustomji wrote: Hello list I am trying to use a Googlemaps tile (png file, 640 X 640 px tile) as a background to a plot and have been using the rgdal library to read in the PNG file (modified from code in the RGoogleMaps package). This works OK. My problem is is that the SGDF2PCT function in rgdal seems to be limited to 256 colors and this is introducing (I think) some degradation to the image which appears on the output graph as speckling. Is there a way around this? I have been reading about the EBImage package but it (a) seemed to have too many dependencies on other software for my purposes and (b) wouldn't load properly in any case on my machine (couldn't find dll's that it was looking for) Ideally I would like to read in the png image and display in its original form as a plot background, hopefully with all its colors represented rather than being converted to a 256 color scale. A code example is below: png(file=file.png, 4000,4000) par(mar=c(0,0,0,0)) myTile - readGDAL(basemap.png,silent=TRUE); #basemap.png is a PNG file returned from googlemaps myt...@data - myt...@data[,1:3] col - SGDF2PCT(myTile,ncolors=256) ## myTile is a spatialGridDataFrame with 3 bands myTile$ind - col$idx ## add the colour index to the data frame myTile - as.image.SpatialGridDataFrame(myTile[ind],1,2)$z; attr(myTile, COL) - col$ct; attr(myTile, type) - rgb; myTile - list(lat.center, lon.center,NA, myTile) image(z=myTile[[4]], col = attr(myTile[[4]], COL)) dev.off() sessionInfo() R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale: LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_MONETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] abind_1.1-0rgdal_0.6-9sp_0.9-37 gridBase_0.4-3 loaded via a namespace (and not attached): [1] lattice_0.17-25 tools_2.9.1 Paul Rustomji __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/Read-PNG-file-and-display-with-more-than-256-colors-RGDAL-tp24490663p24496356.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quotes - useFancyQuotes
On 15/07/2009 7:28 AM, Paulo E. Cardoso wrote: I need to get the exact sentence, with the quotes: OPTIONS COORDSYS(Surface1 AS COMPONENT); but this: options(useFancyQuotes = F) paste(OPTIONS COORDSYS(, dQuote(Surface 1), AS COMPONENT); ) give me this: [1] OPTIONS COORDSYS( \Surface 1\ AS COMPONENT); And it's not readable as SpatialSQL code by GIS How to deal with this problem? Your string is fine, it is only being displayed with escapes. If you want to see it as it is, use cat(), not the default print(). For example, cat(OPTIONS COORDSYS( \Surface 1\ AS COMPONENT);\n) OPTIONS COORDSYS( Surface 1 AS COMPONENT); Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] configure encoding by default
I want using russian letters for my diagrams. I do it in this manner m - заголовок Encode(m) - UTF-8 plot(1,1,main=m) But it is not convenient . How to configure R for using UTF-8 for all string, to work without Encode-function, as plot(1,1,main=заголовок) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with merging matrices
Dear all, I'm a relative new user of R and I have a problem with merging a collection of matrices. All matrices in this collection have the same dimension (532 rows and 532 columns), but can differ in the row and columns names. I'd like to merge these matrices in such a way that the resulting matrix only contains unique row- and column-names and that in case of a match between the row or column names the smallest value is retained. As an example says more: A1-matrix(c(1:9), ncol=3, byrow=TRUE) rownames(A1)-colnames(A1)-c(a,b,c) A1 a b c a 1 2 3 b 4 5 6 c 7 8 9 A2-matrix(c(7,1,3,10,2,7,3,1,8),ncol=3,byrow=TRUE) rownames(A2)-colnames(A2)-c(a,y,z) A2 a y z a 7 1 3 y 10 2 7 z 3 1 8 I want something like this to be returned: abcyz a 1 2313 b 4 56NA NA c 7 89NA NA y 10 NA NA 56 z 3 NA NA 89 Many thanks, Jurgen I'm using: R-2.9.1 (2009-06-26) on a windows XP platform. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (simple) xml into data.frame and reverse
Thanks Dieter. It should have been http://www.omegahat.org/RSXML/xmlToDataFrame.R as it is an R file. Thanks D. Dieter Menne wrote: Duncan Temple Lang wrote: I wrote some relatively general functions, but hastily written functions to read this sort of data. You can find them attached or at http://www.omegahat.org/RSXML/xmlToDataFrame.xml Looks like that's the wrong link. I also did not find it mentioned in the parent page. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mixdist producing Na's and NaN's
When using mix() form the mixdist package, I sometimes get NA's and NaN's in the Standard Errors estimates. I've understood from the code that mix() it is using Non-linear minimization (nlm) to estimate thoses errors, but I can't understand how to interpret theses results. Does any one have a (web) reference to suggest or an interpretation ? Thanks, Etienne -- View this message in context: http://www.nabble.com/Mixdist-producing-Na%27s-and-NaN%27s-tp24497125p24497125.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] configure encoding by default
On 7/15/2009 8:22 AM, Угодай n/a wrote: I want using russian letters for my diagrams. I do it in this manner m - заголовок Encode(m) - UTF-8 plot(1,1,main=m) But it is not convenient . How to configure R for using UTF-8 for all string, to work without Encode-function, as plot(1,1,main=заголовок) You need to set your locale, either at the system level (and R will use it) or with Sys.setlocale() within a session. The details of how to specify that you want UTF-8 characters vary by system; I'm not sure it's possible in Windows, for instance. See the R Installation and Administration manual chapter on internationalization. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: geom_errorbarh()
In ggplot2, you'll want to use size. Hadley On Wed, Jul 15, 2009 at 12:36 PM, Benoit Boulinguiezbenoit.boulingu...@ensc-rennes.fr wrote: Hi, I expect 'width' to set the width of the horizontal line of the whisker, as it does with errobar() Regards/Cordialement Benoit Boulinguiez -Message d'origine- De : hadley wickham [mailto:h.wick...@gmail.com] Envoyé : dimanche 12 juillet 2009 11:04 À : Benoit Boulinguiez Cc : r-help@r-project.org Objet : Re: [R] ggplot2: geom_errorbarh() Hi Benoit, What do you expect width to do? You are already setting the left and right extents with xmin and xmax. Hadley On Thu, Jul 9, 2009 at 10:37 AM, Benoit Boulinguiezbenoit.boulingu...@ensc-rennes.fr wrote: Hi all, quick question: is the optional command width effective in the geom_errorbarh() layer of ggplot? Cause I can't get it works on this graph http://www.4shared.com/file/116919103/93488d88/iso_2PrsH.html pdf(file = iso_2PrsH.pdf, width = 7, height = 7) NC60.iso.graph-ggplot( NC60.DATA ,aes(Ce,Qe)) + geom_point(col=MaCouleur1, size=4) + geom_errorbar( aes(ymax = NC60.DATA$Qe+NC60.DATA$sdQe ,ymin=NC60.DATA$Qe-NC60.DATA$sdQe) ,colour=alpha(black,0.4) ,width=1) + geom_errorbarh( aes(xmax = NC60.DATA$Ce+NC60.DATA$sdCe ,xmin=NC60.DATA$Ce-NC60.DATA$sdCe) ,colour=alpha(black,0.4) ,width=1) + geom_line(data=NC60.Res4.curve ,aes(x,y) ,size=1 ,colour=alpha(black,0.5)) + xlab(C[e]~(mmol/m^3)) + ylab(q[e]~(mmol/m^3)) print(NC60.iso.graph) dev.off() Regards/Cordialement - Benoit Boulinguiez Ph.D student Ecole de Chimie de Rennes (ENSCR) Bureau 1.20 Equipe CIP UMR CNRS 6226 Sciences Chimiques de Rennes Avenue du Général Leclerc CS 50837 35708 Rennes CEDEX 7 Tel 33 (0)2 23 23 80 83 Fax 33 (0)2 23 23 81 20 http://www.ensc-rennes.fr/ http://www.ensc-rennes.fr/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hi friends, is there any wait function in R
Hi deepak m r deepak...@gmail.com napsal dne 14.07.2009 16:07:16: Hi, I am not an expert to debug the R can u please help. best regards deepak See fortune(brain) You does not need to debug R. I believe R-Core will do it for you (If you really find some bug in R, which seems to be highly improbable). Instead see which plots are wrong and try to look at your data by any suitable functions provided by R itself (str(), ls(), head(). In the first sight 1:length(paste(alp,s[t],_mean,sep=)) I expect to get 1. What do ***you*** expect to get? Regards Petr On Tue, Jul 14, 2009 at 10:06 AM, deepak m rdeepak...@gmail.com wrote: Hi, On Tue, Jul 14, 2009 at 9:38 AM, Duncan Murdochmurd...@stats.uwo.ca wrote: On 7/14/2009 8:56 AM, deepak m r wrote: Hi, Empty plot is getting i dont know why. can u please clarify how can i use Print function instead of plot function. You need print() if you are using grid-based graphics (lattice, ggplot2,...) in a script. You are using classic graphics so it should not be necessary. If you are getting blank plots when you shouldn't, that's a bug. If you can put together a reproducible example that shows its a bug in R, rather than a bug in your script, it will likely be fixed fairly quickly. Duncan Murdoch best regards deepak On Tue, Jul 14, 2009 at 7:48 AM, Petr PIKALpetr.pi...@precheza.cz wrote: Hi For this type of problems I do multipage pdf. pdf(file, ) for (i in ...) { do all stuff including plot } dev.off() and then check the plots afterwards. Recently there was some post about how to wait but you do not want only wait you want also to interactively change plotting parameters, won't you. cat(\n,Enter x,\n) # prompt x=scan(n=1) # read 1 line from console this construction print something on console and reads one line from console. There are also some packages which leave you choose from several options. I think in car and randomForest are such routines so you could check how it is done. Regards Petr r-help-boun...@r-project.org napsal dne 14.07.2009 13:17:01: hi, is there any wait function in R. I am running one R script to plot many graphs it is in the for loop. its showing no error but its not plotting well I think i can solve this problem with a wait function. Please help me in this regards. If u need any clarification about programme. u can find the script below. best regards, Deepak.M.R Biocomputing Group University of Bologana. #!/usr/bin/R s-c (GG,GA,GV,GL,GI,GM,GF,GW,GP,GS,GT,GC,GY,GN,GQ,GD,GE,GK,GR,GH,AA,AV,AL,AI,AM,AF,AW,AP,AS,AT,AC,AY,AN,AQ,AD,AE,AK,AR,AH,VV,VL,VI,VM,VF,VW,VP,VS,VT,VC,VY,VN,VQ,VD,VE,VK,VR,VH,LL,LI,LM,LF,LW,LP,LS,LT,LC,LY,LN,LQ,LD,LE,LK,LR,LH,II,IM,IF,IW,IP,IS,IT,IC,IY,IN,IQ,ID,IE,IK,IR,IH,MM,MF,MW,MP,MS,MT,MC,MY,MN,MQ,MD,ME,MK,MR,MH,FF,FW,FP,FS,FT,FC,FY,FN,FQ,FD,FE,FK,FR,FH,WW,WP,WS,WT,WC,WY,WN,WQ,WD,WE,WK,WR,WH,PP,PS,PT,PC,PY,PN,PQ,PD,PE,PK,PR,PH,SS,ST,SC,SY,SN,SQ,SD,SE,SK,SR,SH,TT,TC,TY,TN,TQ,TD,TE,TK,TR,TH,CC,CY,CN,CQ,CD,CE,CK,CR,CH,YY,YN,YQ,YD,YE,YK,YR,YH,NN,NQ,ND,NE,NK,NR,NH,QQ,QD,QE,QK,QR,QH,DD,DE! ,DK,DR,DH,EE,EK,ER,EH,KK,KR,KH,RR,RH,HH) for(t in 1:length(s)) { a-read.table(paste(../All_alpha_proteins/alp,s[t],mean.sat,sep=),header=T) attach (a) names(a) al-1:length(paste(alp,s[t],_mean,sep=)) b-read.table(paste(../All_beta_proteins/bet,s[t],mean.sat,sep=),header=T) attach(b) names(b) bl-1:length(paste(bet,s[t],_mean,sep=)) p-read.table(paste(../Alpha_and_beta_proteins_a+b/apb,s [t],mean.sat,sep=),header=T) attach(p) names(p) pl-1:length(paste(apb,s[t],_mean,sep=)) o-read.table(paste(../Alpha_and_beta_proteins_aorb/aob,s [t],mean.sat,sep=),header=T) attach(o) names(o) ol-1:length(paste(aob,s[t],_mean,sep=)) postscript(file=paste(Mean_,s[t],_Plot.ps,sep=)) plot(0,xlim=c(0,160),ylim=c(0,70),main=paste(Mean Distance Plot for ,s[t], Rsidue pair,sep=),ylab=Mean Distance in Angstrom,xlab=Residue Seperation Number) lines(al,paste(alp,s[t],_mean,sep=),col=blue,lty = 2) lines(bl,paste(bet,s[t],_mean,sep=),col=yellow,lty = 2) lines(pl,paste(apb,s[t],_mean,sep=),col=red,lty = 2) lines(ol,paste(aob,s[t],_mean,sep=),col=green,lty = 2) legend(topleft,c(Alpha Proteins,Beta Proteins,Alpha + Beta,Alpha or Beta),lty = c(2,2,2,2),col=c(blue,yellow,red,green)) dev.off() } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] nls, reach limit bounds
Hi Uyen, In addition to my suggestions in the previous email, you may also want to check to see if the LL.4 function in the drc package has the same parametrization as the function that you are using. Different parametrizations of the same model can have dramatically different convergence behavior in nonlinear least squares problem. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ravi Varadhan Sent: Tuesday, July 14, 2009 7:35 PM To: 'UyenThao Nguyen'; 'spencerg' Cc: r-help@r-project.org Subject: Re: [R] nls, reach limit bounds I took a quick look at drcpackage and the drm function. The drm() function uses optim (BFGS method). So, that is one diffference. However, without looking at your code on how you used drm(), I cannot tell further. The fact that you got an answer using optim() does not necessarily mean that everything is swell. Did you check the Hessian to see if it is positive-definite? You might also want to try the nls.lm() function in the minpack.lm package. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tml -Original Message- From: UyenThao Nguyen [mailto:ungu...@tethysbio.com] Sent: Tuesday, July 14, 2009 7:07 PM To: Ravi Varadhan; 'spencerg' Cc: r-help@r-project.org Subject: RE: [R] nls, reach limit bounds Hi Ravi and Spencer, Thank you very much for your help. I did plot the data, and saw that the data didn't seem to have an inflection point. Yes, the data contained 6 points of duplicates, which the 4 P logistic regression is appropriate to use. I tried the dose response model (drm in drc package), this model works without any problem. Do you know if the drm has different tolerance in convergent or something else? Let's say if I choose drm to fit the data, Can I get the parameters in the same way nls gives me? I tested a converged data set on both drm and nls, and I can't see any relationship between their parameters although the fits were similar. Thank you. Uyen -Original Message- From: Ravi Varadhan [mailto:rvarad...@jhmi.edu] Sent: Monday, July 13, 2009 3:32 PM To: 'spencerg'; UyenThao Nguyen Cc: r-help@r-project.org Subject: RE: [R] nls, reach limit bounds Hi Uyen, You do not have enough information to estimate 4 parameters in your nonlinear model. Even though you have 12 data points, only 6 are contributing independent information (you essentially have 6 replicate points). If you plot the fittted function you will see that it fits your data really well. However, you will also see that the fitted curve is far from reaching the asymptote. An implication of this is that you cannot reliably estimate b0 and b1 separately. So, you need more data, especially for larger x values in the asymptotic region. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of spencerg Sent: Saturday, July 11, 2009 9:50 PM To: UyenThao Nguyen Cc: r-help@r-project.org Subject: Re: [R] nls, reach limit bounds Have you plotted the data? There are two standard sources of non-convergence problems like this: First, there may not be enough information in your data to estimate all four parameters. Second, if that's not the case, then your starting values are not appropriate. I routinely use optim or nlminb to find a sensible solution, because these general purpose optimizers are more likely to converge than nls. To do this, I write a function with a name like SSElogistic to compute the sum of squares of residuals for your model.
[R] Question related to merging zoo objects and apply function
Hello everyone, Say I have defined a convenience function in my code like this: func - function(x, j){ x[168+j] - x[72+j]+x[144+j] } And now I apply it to some columns in a big zoo object like this: for (m in 1:24){ combined - merge(combined, LA1sum=apply(combined, 1, func, j=m)) } output of this for-loop will be zoo object with columns named LA1sum.1, LA1sum.2, ..., LA1.sum24. If I have a vector of names like this: namesVec - c(LA1sum, LP1sum, GC1sum, LL1sum, LN1sum, SI1sum, LX1sum, CO1sum, CL1sum, QS1sum, HO1sum, NG1sum, XB1sum, C.1sum, FC1sum, LH1sum, LC1sum, S.1sum, W.1sum, KW1sum, CC1sum, KC1sum, CT1sum, SB1sum) What I want is in merge() for each m the name of new column to be one of the elements in namesVec, that is for m=1 I have LA1sum=apply(...,j=1) for m=2 I have LP1sum=apply(...,j=2) etc What it the way to do this? Thank you in advance for your help! Regards, Sergey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] DECLARING A PANEL VARIABLE???
Hi I am working on a panel data, my data are clustered/grouped by the variable yearctry, I am running the regression below, but I cant make the regression recognise yearctry as the panel variable in the regression myProbit- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc + imf_pop + estbbo_m, family = binomial(link = probit), data = adpopdata) Can anyone help me do this please??? Thanks Saurav -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simple cat statement - output truncated
I have a statement: cat(myforecast ETS(, paste(object$components[1], object$components[2], object$components[3], object$components[4], sep = ,), ) , n, \n) That generates: cast ETS( A,N,N,FALSE ) 3 Anyone guess as to why the first 5 letters are truncated/missing? Kevin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping data in dataframe
Another approach is to use the reshape package --Assuming your data.frame is called xx -- libarary(reshape) mm - melt(xx, id=c(Size)) ; mm cast(mm, Size ~variable, median) -- --- On Tue, 7/14/09, Timo Schneider timo.schnei...@s2004.tu-chemnitz.de wrote: From: Timo Schneider timo.schnei...@s2004.tu-chemnitz.de Subject: [R] Grouping data in dataframe To: r-help@r-project.org r-help@r-project.org Received: Tuesday, July 14, 2009, 11:56 PM Hello, I have a dataframe (obtained from read.table()) which looks like ExpA ExpB ExpC Size 1 12 23 33 1 2 12 24 29 1 3 10 22 34 1 4 25 50 60 2 5 24 53 62 2 6 21 49 61 2 now I want to take all rows that have the same value in the Size column and apply a function to the columns of these rows (for example median()). The result should be a new dataframe with the medians of the groups, like this: ExpA ExpB ExpC Size 1 12 23 34 1 2 24 50 61 2 I tried to play with the functions by() and tapply() but I didn't get the results I wanted so far, so any help on this would be great! The reason why I am having this problem: (I explain this just to make sure I don't do something against the nature of R.) I am doing 3 simillar experiments, A,B,C and I change a parameter in the experiment (size). Every experiment is done multiple times and I need the median or average over all experiments that are the same. Should I preprocess my data files so that they are completely different? Or is it fine the way it is and I just overlooked the simple solution to the problem described above? Regards, Timo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ Make your browsing faster, safer, and easier with the new Internet Explorer® 8. Optimized for Yahoo! Get it Now for Free! at http://downloads.yahoo.com/ca/internetexplorer/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple cat statement - output truncated
On 7/15/2009 9:53 AM, rkevinbur...@charter.net wrote: I have a statement: cat(myforecast ETS(, paste(object$components[1], object$components[2], object$components[3], object$components[4], sep = ,), ) , n, \n) That generates: cast ETS( A,N,N,FALSE ) 3 Anyone guess as to why the first 5 letters are truncated/missing? You are probably being punished for posting non-reproducible code*. When I try a reproducible version of the line above, things look fine: cat(myforecast ETS(, paste(A,N,N,FALSE, sep = ,), ) , 3, \n) myforecast ETS( A,N,N,FALSE ) 3 Duncan Murdoch * R has a new predictive punishment module. It punishes you for things it knows you will do later. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting hourly time-series data loaded from file using plot.ts
Hello everyone, I am just a tyro in R and would like your kindly help for some problems which I've been struggling for a while but still in vain. I have a time-series file (with some missing value ) which looks like time[sec] , Factor1 , Factor2 00:00:00 01.01.2007 , 0. , 0.176083 01:00:00 01.01.2007 , 0. , 0.176417 [ ... ] 11:00:00 10.06.2007 , 0. , 0.148250 12:00:00 10.06.2007 , NA , 0.147000 13:00:00 10.06.2007 , NA , 0.144417 [ ... ] and I would like to do some basic time-series analyses using R. The first idea is to plot these time-series events and the main problem was the handling of the date/time format in the 1st column. I was using the script below to deal with: data - read.table(file,header=TRUE,sep=,,colClasses=c(character,numeric,numeric)) data$time.sec. - as.POSIXct(data$time.sec.,format=%H:%M:%S %d.%m.%Y) dataTs - as.ts(data) plot.ts(dataTs) Then, the plot showed up with 3 subplots in one plot. The 1st is the linear line with the x-axis being just the sequence of orders and y-axis being wrong numbers which is completely wrong. The 2nd and the 3rd are correct but the x-axis is still wrong. Does anyone know how to plot correct Factor1 and Factor2 with respect to the 1st column time format? Or, probably should I use some other packages? Besides, how can I plot these two time-series data (Factor1 and Factor2) in two separate plots? Best regards, Keith __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix multiplication precision
Hi Douglas, And the big lesson, of course, is the first rule of numerical linear algebra., Just because a formula is written in terms of the inverse of a matrix doesn't mean that is a good way to calculate the result; in fact, it is almost inevitably the worst way of calculating the result. You only calculate the inverse of a matrix after you have investigated all other possible avenues for calculating the result and found them to be fruitless. As a relative noob to the nitty gritty details of numerical linear algebra, could you elaborate on this a bit (even if it's just a link to a book/reference that explains these issues in more detail)? Where/what else should we be looking to do that would be better? Or are there really no general rules, and the answer just depends on the situation? Thanks, -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (simple) xml into data.frame and reverse
Hi Duncan, thanks for the advice and the link. I just realized there is a mistake in the code I posted and I currently don't have the time to correct it. So I can not try your function (yet). But I will do so in the next few days and tell you if it doesnt work. Thanks again and best, Stefan On Wed, Jul 15, 2009 at 2:37 PM, Duncan Temple Langdun...@wald.ucdavis.edu wrote: Thanks Dieter. It should have been http://www.omegahat.org/RSXML/xmlToDataFrame.R as it is an R file. Thanks D. Dieter Menne wrote: Duncan Temple Lang wrote: I wrote some relatively general functions, but hastily written functions to read this sort of data. You can find them attached or at http://www.omegahat.org/RSXML/xmlToDataFrame.xml Looks like that's the wrong link. I also did not find it mentioned in the parent page. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Change data frame column names
Hi R helpers, I have a data frame and I want to change the column names to names I have held in another data frame, but I am having difficulty. All data framnes are large so i can't input manually. Below is what i have tried so far: df-data.frame(a=1:5, b=2:6, d=3:7, e=4:8) coltitles-data.frame(v1=col number one, v2=col number two, v3=col number three, v4=col number four) ##first attempt names(df)-coltitles names(df) [1] 1 1 1 1 ###not what i wanted as I want names(df) to return [1] col number one col number two col number three col number four ##second attempt coltitles-as.vector(coltitles, mode=character) ##trying to convert to a character vector after looking at help is.vector(coltitles) [1] TRUE names(df)-coltitles names(df) [1] 1 1 1 1 ###again not what I wanted How can I convert the column names? Thanks in advance, Tom Beyond Hotmail - see what else you can do with Windows Live. Find out more. _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Spaces in a name
I am reading regressors from an excel file (I have no control over the file) and some of the element names have spaces: i.e. Small Bank Aquired but I have found that lm(SourceData ~ . - Small Bank Aquired, mcReg) doesn't work (mcReg = modelCurrentRegressors) As they are toggles I have ran them through factor() to be treated propertly as 0 or 1 but due to the fact I am grabbing automagically the first 2/3rds of the data some of the regressors are either all 0s or all 1s accordingly so I need to take them out of the model by hand for now until I find a nice automatic method for removing regressors that only have 1 factor. So Primarily: how do I handle names that include spaces in this context and as a bonus: Anyone have a nice method for yanking regressors that only have a single factor in them from the lm() function? e.g. (for the following 30 elements) 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 1,1,1,1,1,1,1,1,1,1 As you can see grabbing the first 2/3rds is all 0s and the last 1/3rd is all ones (doing in-sample forecast diagnostic building the model only on the first 2/3rds of data, then forecasting the next 1/3rd and comparing.) Sorry if I am rambling a bit, still on cup of coffee #1... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix multiplication precision
A good discussion of this is provided by Gill, Murray and Wright Num Lin Alg and Opt, section 4.7.2. url:www.econ.uiuc.edu/~rogerRoger Koenker emailrkoen...@uiuc.eduDepartment of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Urbana, IL 61801 On Jul 15, 2009, at 9:27 AM, Steve Lianoglou wrote: Hi Douglas, And the big lesson, of course, is the first rule of numerical linear algebra., Just because a formula is written in terms of the inverse of a matrix doesn't mean that is a good way to calculate the result; in fact, it is almost inevitably the worst way of calculating the result. You only calculate the inverse of a matrix after you have investigated all other possible avenues for calculating the result and found them to be fruitless. As a relative noob to the nitty gritty details of numerical linear algebra, could you elaborate on this a bit (even if it's just a link to a book/reference that explains these issues in more detail)? Where/what else should we be looking to do that would be better? Or are there really no general rules, and the answer just depends on the situation? Thanks, -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predictive punishment module (was: Re: Simple cat statement - output truncated)
Duncan Murdoch murd...@stats.uwo.ca 15/07/2009 15:04:29 * R has a new predictive punishment module. It punishes you for things it knows you will do later. Dang! Does that mean it'll return errors for the statistical idiocy I haven't yet got around to committing? ;-) Steve E *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change data frame column names
On 7/15/2009 10:35 AM, Tom Liptrot wrote: Hi R helpers, I have a data frame and I want to change the column names to names I have held in another data frame, but I am having difficulty. All data framnes are large so i can't input manually. Below is what i have tried so far: df-data.frame(a=1:5, b=2:6, d=3:7, e=4:8) coltitles-data.frame(v1=col number one, v2=col number two, v3=col number three, v4=col number four) It would be simpler to use coltitles - c((v1=col number one, v2=col number two, v3=col number three, v4=col number four) (and the v1=, v2=, etc. are redundant). Then your first attempt would work. ##first attempt names(df)-coltitles names(df) [1] 1 1 1 1 ###not what i wanted as I want names(df) to return [1] col number one col number two col number three col number four If you are taking titles from one data frame to put on another as in your original code, then use names(df) - names(coltitles) Duncan Murdoch ##second attempt coltitles-as.vector(coltitles, mode=character) ##trying to convert to a character vector after looking at help is.vector(coltitles) [1] TRUE names(df)-coltitles names(df) [1] 1 1 1 1 ###again not what I wanted How can I convert the column names? Thanks in advance, Tom Beyond Hotmail - see what else you can do with Windows Live. Find out more. _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The greatest common divisor between more than two integers
Thanks! I try that. There is in some packege such a function. Atte On Wed, Jul 15, 2009 at 8:55 AM, Atte Tenkanenatte...@utu.fi wrote: Do somebody know if there is a function in R which computes the greatest common divisor between several (more than two) integers? Is there a function for computing the greatest common divisor of *two* numbers? I can't find one, but assume that there is such a function, and call it gcd. Then you could define a recursive function to do the job. Something like new_gcd = function(v) { if (length(v)==2) return(gcd(v)) else return (new_gcd(v[1],new_gcd(v[2:length(v)])) } where v is a vector containing the numbers you want to calculate the greatest common divisor of. -- Michael Knudsen micknud...@gmail.com http://lifeofknudsen.blogspot.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in package.skeleton, R 2.9.0?
Bump Anyone? Thanks Daniel On 13 jul 2009, at 10.57, Daniel Klevebring wrote: Dear all, I am using package.skeleton to build a small packages of misc function for personal use. I have recently discovered that the option force=TRUE doesn't seem to do what is meant to do. Here's what I'm doing: setwd(/Users/danielk/Documents/R/packages/dk) files - paste(codebase, dir(codebase, pattern=.R), sep=/) package.skeleton(name=dk, force=TRUE, code_files=files) Creating directories ... Creating DESCRIPTION ... Creating Read-and-delete-me ... Copying code files ... Making help files ... Done. Further steps are described in './dk/Read-and-delete-me'. Now, everything seems fine, but changes to files in me codebase folder, doesn't come along if the folder dk/R already contains the files, even though I use force=TRUE. If I remove the dk/R folder or the dk folder altogether, the changes come along so to me it seems that it's the overwrite part that doesn't work as it should - or am I doing something wrong here? To me, it seems that the function safe.dir.create (which is defined in package.skeleton never overwrites folders, yielding force=TRUE useless. See below for sessionInfo. Thanks a bunch Daniel sessionInfo() R version 2.9.0 (2009-04-17) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.9.0 -- Contact information: Daniel Klevebring M. Sc. Eng., Ph.D. Student Dept of Gene Technology Royal Institute of Technology, KTH SE-106 91 Stockholm, Sweden Visiting address: Roslagstullsbacken 21, B3 Delivery address: Roslagsvägen 30B, 104 06, Stockholm Invoice address: KTH Fakturaserice, Ref DAKL KTHBIO, Box 24075, SE-10450 Stockholm E-mail: dan...@biotech.kth.se Phone: +46 8 5537 8337 (Office) Phone: +46 704 71 65 91 (Mobile) Web: http://www.biotech.kth.se/genetech/index.html Fax: +46 8 5537 8481 -- Contact information: Daniel Klevebring M. Sc. Eng., Ph.D. Student Dept of Gene Technology Royal Institute of Technology, KTH SE-106 91 Stockholm, Sweden Visiting address: Roslagstullsbacken 21, B3 Delivery address: Roslagsvägen 30B, 104 06, Stockholm Invoice address: KTH Fakturaserice, Ref DAKL KTHBIO, Box 24075, SE-10450 Stockholm E-mail: dan...@biotech.kth.se Phone: +46 8 5537 8337 (Office) Phone: +46 704 71 65 91 (Mobile) Web: http://www.biotech.kth.se/genetech/index.html Fax: +46 8 5537 8481 MSN messenger: klevebr...@msn.com [[alternative HTML version deleted]] ATT1.txt -- Contact information: Daniel Klevebring M. Sc. Eng., Ph.D. Student Dept of Gene Technology Royal Institute of Technology, KTH SE-106 91 Stockholm, Sweden Visiting address: Roslagstullsbacken 21, B3 Delivery address: Roslagsvägen 30B, 104 06, Stockholm Invoice address: KTH Fakturaserice, Ref DAKL KTHBIO, Box 24075, SE-10450 Stockholm E-mail: dan...@biotech.kth.se Phone: +46 8 5537 8337 (Office) Phone: +46 704 71 65 91 (Mobile) Web: http://www.biotech.kth.se/genetech/index.html Fax: +46 8 5537 8481 MSN messenger: klevebr...@msn.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is it possible to use EGARCH and GJR in R?
Hi, Could you please help me with EGARCH and GJR? Is it possible to use EGARCH and GJR in R? I have used below mentioned code for GARCH in R, but I never used EGARCH and GJR in R. Thank you in advance! daten-read.table(H://Daten//Zeitreihen//dax_1.csv, sep=;, header=T) DAX.kurs-daten DAX.kurs-ts(DAX.kurs,names=DAX-Kurs) DAX.rendite-diff(DAX.kurs)/DAX.kurs[1:length(DAX.kurs)-1] g-garchFit(data=DAX.rendite,formula=~garch(1,1),const.dist=dstd) g.predict-predict(g,n.ahead=1) Best regards, Andy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-sig-Geo] how to work with useFancyQuotes( ) to avoid \ quotes
Try this: options(useFancyQuotes = F) print(paste(OPTIONS COORDSYS(, dQuote(Surface 1), AS COMPONENT); ),quote=F) HTH, Jaime -R On Wed, 15 Jul 2009 13:15:32 +0200, Paulo E. Cardoso pecard...@netcabo.pt wrote: I need to get the exact sentence, with the quotes: OPTIONS COORDSYS(Surface1 AS COMPONENT); but this: options(useFancyQuotes = F) paste(OPTIONS COORDSYS(, dQuote(Surface 1), AS COMPONENT); ) give me this: [1] OPTIONS COORDSYS( \Surface 1\ AS COMPONENT); And it's not readable as SpatialSQL code by GIS How to deal with this problem? Paulo E. Cardoso ___ R-sig-Geo mailing list r-sig-...@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nagelkerkes R2N
I am interested Andrea is whether you ever established why your R2 was 1. I have had a similar situation previously. My main issue though, which I'd be v grateful for advice on, is why I am obtaining such negative values -0.3 for Somers Dxy using validate.cph from the Design package given my value of Nagelkerke R2 is not so low 13.2%. I have this output when fitting 6 variables all with p-values0.01 I am wondering what the interpretation should be. I know my Nagelkerke R2 isn't very good but I compare my results with the example from ?validate.cph and although I have a better R2 (13% v 9%) the Somers dxy from the example data set is much better, 38%, so certainly not negative ! So my main question is : Why such a difference between explained variation, R2, and predictive ability: somers dxy ?? Obs Events Model L.R. d.f. P ScoreScore P R2 471228 66.36 6 0 73.41 0 0.132 validate(f, B=150,dxy=T) # normally B=150 index.orig training test optimism index.corrected n Dxy -0.3022537331 -0.3135032097 -0.292492573 -0.021010636 -0.2812430968 150 R2 0.1319445685 0.1431179294 0.122599605 0.0205183240.1114262446 150 Slope 1.00 1.00 0.923340558 0.0766594420.9233405576 150 D 0.0250864459 0.0276820092 0.023163167 0.0045188420.0205676038 150 U -0.0007676033 -0.0007725071 0.000610456 -0.0013829630.0006153598 150 Q 0.0258540493 0.0284545164 0.022552711 0.0059018050.0199522440 150 I also calculated the Schemper and Henderson V measure and obtained v=10.5% I was using the surev package of Lusa Lara; Miceli Rosalba; Mariani LuigiEstimation of predictive accuracy in survival analysis using R and S-PLUS.http://www.biomedexperts.com/Abstract.bme/17601627/Estimation_of_predictive_accuracy_in_survival_analysis_using_R_and_S-PLUS Computer methods and programs in biomedicine 2007;87(2):132-7. And my code was library(surev) pred.accuracy-f.surev(f) pred.accuracy sorry if my question isn't clear - should I have included my sessionInfo for a methodological question ? (I'm a newbie) many thanks for any advice [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems with computing an aggregated score
Hi all, I have a problem with computing a new variable as an aggregated score of other variables. Say, I have 10 variables for 1,000 observations (people). Every variable may have values between 0 and 8. What I would like to do is computing the mean of the individual top 3 values for every person. Exampe: The values for the 10 variables (v1 to v10) for person A, B and C are as follows: A 0 1 0 2 5 8 3 0 4 0 B 6 4 3 0 0 0 0 5 0 0 C 0 0 8 0 0 8 0 0 0 0 So, I would like to compute a new variable for the mean of the individual top 3 values, which would result for person A, B and C in the following: A (8+5+4)/3 = 5.67 B (6+5+4)/3 = 5 C (8+8+0)/3 = 5.33 Is there any way to do this? Any clues, hints and suggestions are highly appreciated, many thanks in advance Johannes -- View this message in context: http://www.nabble.com/Problems-with-computing-an-aggregated-score-tp24495390p24495390.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tkbutton and functions help
Hey there, I’m quite lost in the R tcltk package and would love to get some help. My problem is this: Total_substes = 5 # example subcounter=1 # while the maximum number of subsets is not reached put buttons onto the tt widget while (subcounter = total_subsets) { subname = paste(subset , 1:total_subsets, sep = ) subnameasstring2 - subname[subcounter] toString(subnameasstring2) subname.button = paste(subnameasstring2,.button, sep = ) subname.button - tkbutton(tt,text=subnameasstring2, command=function() iconfigure(vhight, vwidth, vanzheight, vanzwidth, total_number_iplots_onscreen, subcounter)) tkgrid(subname.button) subcounter = subcounter +1 } I create dynamic buttons but when I press them the value of i.e. subcounter is overritten by the last value assignment Button subset_1.button gives me in this case when I press it 6 and that’s obviously wrong because it should be 1. Has anybody an idea how I can solve that? Btw: The goal is to spread isets onto the screen (but a lot of them) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] time series fiting and residual computing
Dear 'R' Users, I have CO2 timeseries which I want to fit with fourth harmonic function and then would like to compute residuals. I tried with two option but not sure which one is correct, kindly any one can help correcting me: # #I would like to fit a model with Fourth harmonic function as: #m(t)=a+bt+ct^2+c1sin(omega1*t)+d1cos(omega1.t)+c2sin(omega2.t)+d2cos(omega2.t)+c3sin(omega3.t)+d3cos(omega3.t)+c4sin(omega4.t)+d4cos(omega4.t) # In my data 'file' I have two variable 'co2obs' and 'time' # SO above function in R will look like EITHER as: testfit-lm(co2obs~1+time+(time^2)+sin(2*pi*time)+cos(2*pi*time)+sin(4*pi*time)+cos(4*pi*time)+sin(6*pi*time)+cos(6*pi*time)+sin(8*pi*time)+cos(8*pi*time),data=file) #RESIDUALS COMPUTING for above data fit testfit$residuals OR as: testfit-lm(co2obs~1+time+I(time^2)+sin(2*pi*time)+cos(2*pi*time)+sin(4*pi*time)+cos(4*pi*time)+sin(6*pi*time)+cos(6*pi*time)+sin(8*pi*time)+cos(8*pi*time),data=file) #RESIDUALS COMPUTING for above data fit testfit$residuals NOTE: The 'third' term (ct^2) in both above is wrriten differently, rest terms are same. Kindly advise which one in above is correct, OR both are wrong ? Great Thanks, Regards, Yogesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change data frame column names
On 7/15/2009 10:35 AM, Tom Liptrot wrote: Hi R helpers, I have a data frame and I want to change the column names to names I have held in another data frame, but I am having difficulty. All data framnes are large so i can't input manually. Below is what i have tried so far: df-data.frame(a=1:5, b=2:6, d=3:7, e=4:8) coltitles-data.frame(v1=col number one, v2=col number two, v3=col number three, v4=col number four) ##first attempt names(df)-coltitles names(df) [1] 1 1 1 1 ###not what i wanted as I want names(df) to return [1] col number one col number two col number three col number four Not sure if my first reply went out; it had an error in it, because I misread what you were trying to do. You want to assign a character vector as names. You can set it up like that originally using coltitles - c(col number one, col number two, col number three, col number four) and then your first attempt will work. If you need to get the names out of a dataframe, then use names(x) - coltitles[1,] to select the first row (or select some other row if you want) of the dataframe to use as names. Duncan Murdoch ##second attempt coltitles-as.vector(coltitles, mode=character) ##trying to convert to a character vector after looking at help is.vector(coltitles) [1] TRUE names(df)-coltitles names(df) [1] 1 1 1 1 ###again not what I wanted How can I convert the column names? Thanks in advance, Tom Beyond Hotmail - see what else you can do with Windows Live. Find out more. _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with merging matrices
On Wednesday, July 15, 2009 8:28 AM, jurgen claesen wrote: ...I'm a relative new user of R and I have a problem with merging a collection of matrices. All matrices in this collection have the same dimension (532 rows and 532 columns), but can differ in the row and columns names. I'd like to merge these matrices in such a way that the resulting matrix only contains unique row- and column-names and that in case of a match between the row or column names the smallest value is retained As an example says more: A1-matrix(c(1:9), ncol=3, byrow=TRUE) rownames(A1)-colnames(A1)-c(a,b,c) a b c a 1 2 3 b 4 5 6 c 7 8 9 A2-matrix(c(7,1,3,10,2,7,3,1,8),ncol=3,byrow=TRUE) rownames(A2)-colnames(A2)-c(a,y,z) a y z a 7 1 3 y 10 2 7 z 3 1 8 I want something like this to be returned: a b c y z a 1 2 3 1 3 b 4 5 6 NA NA c 7 8 9 NA NA y 10 NA NA 5 6 z 3 NA NA 8 9 Two questions (a) how do you want to decide which of the elements gets dropped out---like the value 7 in A2 [1,1] above and (b) what goes in the bottom corner of the new matrix? I would have guessed i would have seen 2 7 1 8 in the corner. Was that a typo or do you really want to overwrite the values of the submatrix that *is* in A1? david -- David - David Huffer, Ph.D. Senior Statistician CSOSA/Washington, DC david.huf...@csosa.gov __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting hourly time-series data loaded from file using plot.ts
Try the zoo package: Lines - time[sec] , Factor1 , Factor2 00:00:00 01.01.2007 , 0. , 0.176083 01:00:00 01.01.2007 , 0. , 0.176417 11:00:00 10.06.2007 , 0. , 0.148250 12:00:00 10.06.2007 , NA , 0.147000 13:00:00 10.06.2007 , NA , 0.144417 library(zoo) library(chron) z - read.zoo(textConnection(Lines), sep = ,, header = TRUE, format = %H:%M:%S %m.%d.%Y, tz = , strip.white = TRUE) plot(z) and read the three vignetttes (pdf documents) that come with it. On Wed, Jul 15, 2009 at 10:07 AM, Keithkigio...@gmail.com wrote: Hello everyone, I am just a tyro in R and would like your kindly help for some problems which I've been struggling for a while but still in vain. I have a time-series file (with some missing value ) which looks like time[sec] , Factor1 , Factor2 00:00:00 01.01.2007 , 0. , 0.176083 01:00:00 01.01.2007 , 0. , 0.176417 [ ... ] 11:00:00 10.06.2007 , 0. , 0.148250 12:00:00 10.06.2007 , NA , 0.147000 13:00:00 10.06.2007 , NA , 0.144417 [ ... ] and I would like to do some basic time-series analyses using R. The first idea is to plot these time-series events and the main problem was the handling of the date/time format in the 1st column. I was using the script below to deal with: data - read.table(file,header=TRUE,sep=,,colClasses=c(character,numeric,numeric)) data$time.sec. - as.POSIXct(data$time.sec.,format=%H:%M:%S %d.%m.%Y) dataTs - as.ts(data) plot.ts(dataTs) Then, the plot showed up with 3 subplots in one plot. The 1st is the linear line with the x-axis being just the sequence of orders and y-axis being wrong numbers which is completely wrong. The 2nd and the 3rd are correct but the x-axis is still wrong. Does anyone know how to plot correct Factor1 and Factor2 with respect to the 1st column time format? Or, probably should I use some other packages? Besides, how can I plot these two time-series data (Factor1 and Factor2) in two separate plots? Best regards, Keith __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change data frame column names
On Wed, 2009-07-15 at 14:35 +, Tom Liptrot wrote: Hi R helpers, I have a data frame and I want to change the column names to names I have held in another data frame, but I am having difficulty. All data framnes are large so i can't input manually. Below is what i have tried so far: df-data.frame(a=1:5, b=2:6, d=3:7, e=4:8) coltitles-data.frame(v1=col number one, v2=col number two, v3=col number three, v4=col number four) ##first attempt names(df)-coltitles names(df) [1] 1 1 1 1 ###not what i wanted as I want names(df) to return [1] col number one col number two col number three col number four ##second attempt coltitles-as.vector(coltitles, mode=character) ##trying to convert to a character vector after looking at help is.vector(coltitles) [1] TRUE names(df)-coltitles names(df) [1] 1 1 1 1 ###again not what I wanted How can I convert the column names? This seems a bit of a strange way to go about things, but if needs must: names(df) - unlist(coltitles) df col number one col number two col number three col number four 1 1 23 4 2 2 34 5 3 3 45 6 4 4 56 7 5 5 67 8 names(df) [1] col number one col number two col number three col number four If your data frame with the names in the one (and only) row (why store this as a one row df and not a character vector??? [1]) then by unlisting it we get a (or something that can be coerced to) a character vector of the correct names. If coltitles contains more rows, then: names(df) - unlist(coltitles[1,]) might be more appropriate. HTH G [1] Your df storage of names is s inefficient (for this task): object.size(names(df)) # after running the above code 312 bytes object.size(coltitles) 2728 bytes Thanks in advance, Tom Beyond Hotmail - see what else you can do with Windows Live. Find out more. _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with computing an aggregated score
On Wed, Jul 15, 2009 at 3:37 AM, Hatschjohannes.h...@googlemail.com wrote: Hi all, I have a problem with computing a new variable as an aggregated score of other variables. Say, I have 10 variables for 1,000 observations (people). Every variable may have values between 0 and 8. What I would like to do is computing the mean of the individual top 3 values for every person. Exampe: The values for the 10 variables (v1 to v10) for person A, B and C are as follows: A 0 1 0 2 5 8 3 0 4 0 B 6 4 3 0 0 0 0 5 0 0 C 0 0 8 0 0 8 0 0 0 0 So, I would like to compute a new variable for the mean of the individual top 3 values, which would result for person A, B and C in the following: A (8+5+4)/3 = 5.67 B (6+5+4)/3 = 5 C (8+8+0)/3 = 5.33 Is there any way to do this? Any clues, hints and suggestions are highly appreciated, many thanks in advance Johannes Unquestionably the experienced guys are going to give you a better answer, but as a newbie I worked out a step by step answer in a few minutes. This allows you to see the steps: A - c(0, 1, 0, 2, 5, 8, 3, 0, 4, 0) A[order(as.numeric(A), decreasing=TRUE)] A[order(as.numeric(A), decreasing=TRUE)][1:3] sum(A[order(as.numeric(A), decreasing=TRUE)][1:3]) sum(A[order(as.numeric(A), decreasing=TRUE)][1:3])/3 Hope this helps, Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change data frame column names
Try names(df) - as.vector(unlist(coltitles)) This will fail if you have more than one row in your data frame coltitles. Which is why Duncan Murdoch's second email has a better solution. Given the difficulty in getting the titles out of the data frame in vector form, I would wonder why you're storing them as data in a data frame in the first place. Note also that the default behavior when creating a dataframe is for character data to be converted to factors. You can see this with class(coltitles$v1) [1] factor That's part of what gave you weird results. So I would suggest looking at the help pages to figure out how to prevent conversion to factor. If you want names(df) to return col number one etc, then you have to create a data frame whose column names are those. You created a dataframe (coltitles) whose names were v1, v2, etc. For example, tmp - data.frame(col number one=1, col number two='foobar',check.names=FALSE) names(tmp) [1] col number one col number two But simpler and easier to just do as Duncan suggested, and create a vector of names. -Don At 2:35 PM + 7/15/09, Tom Liptrot wrote: Hi R helpers, I have a data frame and I want to change the column names to names I have held in another data frame, but I am having difficulty. All data framnes are large so i can't input manually. Below is what i have tried so far: df-data.frame(a=1:5, b=2:6, d=3:7, e=4:8) coltitles-data.frame(v1=col number one, v2=col number two, v3=col number three, v4=col number four) ##first attempt names(df)-coltitles names(df) [1] 1 1 1 1 ###not what i wanted as I want names(df) to return [1] col number one col number two col number three col number four ##second attempt coltitles-as.vector(coltitles, mode=character) ##trying to convert to a character vector after looking at help is.vector(coltitles) [1] TRUE names(df)-coltitles names(df) [1] 1 1 1 1 ###again not what I wanted How can I convert the column names? Thanks in advance, Tom Beyond Hotmail - see what else you can do with Windows Live. Find out more. _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://*stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question related to merging zoo objects and apply function
Please read the last line of every message to r-help. In particular simplify this as much as possible and construct some small artificial test data to illustrate. Anyways, func is probably not what you want. It has the same effect as func - function(x, j) x[72+j] + [144+j] On Wed, Jul 15, 2009 at 9:26 AM, Sergey Goriatchevserg...@gmail.com wrote: Hello everyone, Say I have defined a convenience function in my code like this: func - function(x, j){ x[168+j] - x[72+j]+x[144+j] } And now I apply it to some columns in a big zoo object like this: for (m in 1:24){ combined - merge(combined, LA1sum=apply(combined, 1, func, j=m)) } output of this for-loop will be zoo object with columns named LA1sum.1, LA1sum.2, ..., LA1.sum24. If I have a vector of names like this: namesVec - c(LA1sum, LP1sum, GC1sum, LL1sum, LN1sum, SI1sum, LX1sum, CO1sum, CL1sum, QS1sum, HO1sum, NG1sum, XB1sum, C.1sum, FC1sum, LH1sum, LC1sum, S.1sum, W.1sum, KW1sum, CC1sum, KC1sum, CT1sum, SB1sum) What I want is in merge() for each m the name of new column to be one of the elements in namesVec, that is for m=1 I have LA1sum=apply(...,j=1) for m=2 I have LP1sum=apply(...,j=2) etc What it the way to do this? Thank you in advance for your help! Regards, Sergey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question related to merging zoo objects and apply function
Hello, Gabor Generally, I follow the r-help rules and provide working code snippets to illustrate the problem. In this case it was more methodological question of how to loop names in merge() function. I solved this very simply buy renaming specific columns after I have ran merge(). Thank you for your time! Best, Sergey On Wed, Jul 15, 2009 at 17:18, Gabor Grothendieckggrothendi...@gmail.com wrote: Please read the last line of every message to r-help. In particular simplify this as much as possible and construct some small artificial test data to illustrate. Anyways, func is probably not what you want. It has the same effect as func - function(x, j) x[72+j] + [144+j] On Wed, Jul 15, 2009 at 9:26 AM, Sergey Goriatchevserg...@gmail.com wrote: Hello everyone, Say I have defined a convenience function in my code like this: func - function(x, j){ x[168+j] - x[72+j]+x[144+j] } And now I apply it to some columns in a big zoo object like this: for (m in 1:24){ combined - merge(combined, LA1sum=apply(combined, 1, func, j=m)) } output of this for-loop will be zoo object with columns named LA1sum.1, LA1sum.2, ..., LA1.sum24. If I have a vector of names like this: namesVec - c(LA1sum, LP1sum, GC1sum, LL1sum, LN1sum, SI1sum, LX1sum, CO1sum, CL1sum, QS1sum, HO1sum, NG1sum, XB1sum, C.1sum, FC1sum, LH1sum, LC1sum, S.1sum, W.1sum, KW1sum, CC1sum, KC1sum, CT1sum, SB1sum) What I want is in merge() for each m the name of new column to be one of the elements in namesVec, that is for m=1 I have LA1sum=apply(...,j=1) for m=2 I have LP1sum=apply(...,j=2) etc What it the way to do this? Thank you in advance for your help! Regards, Sergey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- I'm not young enough to know everything. /Oscar Wilde Experience is one thing you can't get for nothing. /Oscar Wilde When you are finished changing, you're finished. /Benjamin Franklin Tell me and I forget, teach me and I remember, involve me and I learn. /Benjamin Franklin Luck is where preparation meets opportunity. /George Patten __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping data in dataframe
Am Mittwoch, den 15.07.2009, 00:42 -0500 schrieb markle...@verizon.net: Hi! Hi: I think aggregate does what you want. you had 34 in one of your columns but I think you meant it to be 33. DF - read.table(textConnection(ExpA ExpB ExpC Size 1 12 23 33 1 2 12 24 29 1 3 10 22 34 1 4 25 50 60 2 5 24 53 62 2 6 21 49 61 2),header=TRUE) print(DF) print(str(DF)) aggregate(DF,list(DF$Size),median) Yes, thanks to you and all the other people who helped! The aggregate function is exactly what I was looking for. Thanks for the help! Regards, Timo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predictive punishment module (was: Re: Simple cat statem
On 15-Jul-09 14:45:26, S Ellison wrote: Duncan Murdoch murd...@stats.uwo.ca 15/07/2009 15:04:29 * R has a new predictive punishment module. It punishes you for things it knows you will do later. Dang! Does that mean it'll return errors for the statistical idiocy I haven't yet got around to committing? ;-) Steve E It is even more twisted and sadistic than that. Being an Expert System on Human Psychology (how else could it succeed so well in putting the user on the wrong foot), it knows that the user will interpret the punishment as being related to the correct thing that the user just did, even though the punishment is for the thing that will be done later. Users will therefore develop an inhibition in respect of correct actions, and will be driven into incorrect behaviour. The incorrect thing that will be done later will be punished anyway. Since the complement to {an incorrect thing} is {correct thing} union {all the other incorrect things} the user will end up a) Confused b) Increasingly likely to do incorrect things, therefore increasingly likely to be punished, finally being punished on almost all occasions of action (predictively or not). So, now we know ... :( Ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 15-Jul-09 Time: 16:21:53 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with computing an aggregated score
Dear Johannes, If 'x' is your data set, here is one way using apply: res - apply(x[,-1], 1, function(x){ xo - sort(x, decreasing = TRUE) mean(xo[1:3]) } ) names(res) - x[,1] res # ABC # 5.67 5.00 5.33 See ?apply, ?sort and ?mean for more information. HTH, Jorge On Wed, Jul 15, 2009 at 6:37 AM, Hatsch johannes.h...@googlemail.comwrote: Hi all, I have a problem with computing a new variable as an aggregated score of other variables. Say, I have 10 variables for 1,000 observations (people). Every variable may have values between 0 and 8. What I would like to do is computing the mean of the individual top 3 values for every person. Exampe: The values for the 10 variables (v1 to v10) for person A, B and C are as follows: A 0 1 0 2 5 8 3 0 4 0 B 6 4 3 0 0 0 0 5 0 0 C 0 0 8 0 0 8 0 0 0 0 So, I would like to compute a new variable for the mean of the individual top 3 values, which would result for person A, B and C in the following: A (8+5+4)/3 = 5.67 B (6+5+4)/3 = 5 C (8+8+0)/3 = 5.33 Is there any way to do this? Any clues, hints and suggestions are highly appreciated, many thanks in advance Johannes -- View this message in context: http://www.nabble.com/Problems-with-computing-an-aggregated-score-tp24495390p24495390.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] negative Somers D from Design package
Dear R help My problem is very similar to the analysis detailed here. If we use the mayo dataset provided with the survivalROC package the estimate for Somer's Dxy is very negative -0.56. The Nagelkerke R2 is positive though 0.32. I know there is a difference between explained variation and predictive ability but I am surprised there is usch a difference given that even a non predictive model should have Dxy around 0. Am I doing something wrong or is there an interpretation that makes sense ? This is with the mayo data so its reproducible but the result with my data is very similar. Many thanks in advance library(survivalROC) library(Design) library(survival) data(mayo) Sm - Surv(mayo$time,mayo$censor) fm - cph( Sm ~ mayoscore4,mayo,x=T,y=T,surv=T ) validate(fm, B=150,dxy=T) Iteration 1 index.orig training test optimism index.corrected n Dxy -0.566027923 -0.55407 -0.566027923 -0.0006374833-0.565390440 150 R2 0.325860603 0.327350885 0.325860603 0.0014902826 0.324370320 150 Slope 1.0 1.0 0.987854765 0.0121452354 0.987854765 150 D 0.093398440 0.095166239 0.093398440 0.0017677983 0.091630642 150 U -0.001562582 -0.001579618 0.001150175 -0.0027297932 0.001167211 150 Q 0.094961022 0.096745857 0.092248266 0.0044975915 0.090463431 150 Dr Bernard North Statistical Consultant Statistical Advisory Service Advice and Courses on Research Design and Methodology Imperial College South Kensington Campus Room 845, 4th Floor 8 Princes Gardens London SW7 1NA. Tel: 020 7594 2034 Fax: 020 7594 1489 Email: bno...@imperial.ac.uk Web: www.ic.ac.uk/stathelp [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] For loop for distinguishing negative numbers
Hi i am very new to R and I have been trying to change each individual piece of data in a data set to 10 if it is below 0 and 5 if it is above 0. I know this sounds very easy but i am struggling!! -- View this message in context: http://www.nabble.com/For-loop-for-distinguishing-negative-numbers-tp24499872p24499872.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls, reach limit bounds
... and getting an answer is no assurance that the answer is meaningful. In cases like this (which arise frequently because of insistence on using the accepted mechanistic model paradigm even in the absence of informative data to estimate it), the confidence intervals for (correlated) parameters will usually be so wide as to be useless. That is, for practical purposes, the model is nonidentifiable. This can easily be seen by making small random perturbations in the data and watching how the parameter estimates change -- i.e. performing a sensitivity analysis. Incidentally, the predictions will typically be fine, so the standard scientific practice of graphing the data with an overlaid smooth curve as a check on the validity of the estimated parameters is nonsense. One should not get so lost among the trees of statistical niceties that one loses sight of the scientific forest: the model can't be adequately fit by the data. Either get more data or choose a more appropriate model. Cheers, Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ravi Varadhan Sent: Tuesday, July 14, 2009 4:35 PM To: 'UyenThao Nguyen'; 'spencerg' Cc: r-help@r-project.org Subject: Re: [R] nls, reach limit bounds I took a quick look at drcpackage and the drm function. The drm() function uses optim (BFGS method). So, that is one diffference. However, without looking at your code on how you used drm(), I cannot tell further. The fact that you got an answer using optim() does not necessarily mean that everything is swell. Did you check the Hessian to see if it is positive-definite? You might also want to try the nls.lm() function in the minpack.lm package. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tml -Original Message- From: UyenThao Nguyen [mailto:ungu...@tethysbio.com] Sent: Tuesday, July 14, 2009 7:07 PM To: Ravi Varadhan; 'spencerg' Cc: r-help@r-project.org Subject: RE: [R] nls, reach limit bounds Hi Ravi and Spencer, Thank you very much for your help. I did plot the data, and saw that the data didn't seem to have an inflection point. Yes, the data contained 6 points of duplicates, which the 4 P logistic regression is appropriate to use. I tried the dose response model (drm in drc package), this model works without any problem. Do you know if the drm has different tolerance in convergent or something else? Let's say if I choose drm to fit the data, Can I get the parameters in the same way nls gives me? I tested a converged data set on both drm and nls, and I can't see any relationship between their parameters although the fits were similar. Thank you. Uyen -Original Message- From: Ravi Varadhan [mailto:rvarad...@jhmi.edu] Sent: Monday, July 13, 2009 3:32 PM To: 'spencerg'; UyenThao Nguyen Cc: r-help@r-project.org Subject: RE: [R] nls, reach limit bounds Hi Uyen, You do not have enough information to estimate 4 parameters in your nonlinear model. Even though you have 12 data points, only 6 are contributing independent information (you essentially have 6 replicate points). If you plot the fittted function you will see that it fits your data really well. However, you will also see that the fitted curve is far from reaching the asymptote. An implication of this is that you cannot reliably estimate b0 and b1 separately. So, you need more data, especially for larger x values in the asymptotic region. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tml -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of spencerg Sent: Saturday, July 11, 2009 9:50 PM To: UyenThao Nguyen Cc: r-help@r-project.org Subject: Re: [R] nls, reach limit bounds Have you plotted the data? There are two standard sources of non-convergence problems like this: First, there may not be enough information in your data to estimate all four parameters. Second, if that's not the case, then your starting values are not appropriate. I routinely use optim or nlminb to find
Re: [R] For loop for distinguishing negative numbers
see ?ifelse you didn't specify what happens if a value is exactly zero in the dataset and so i've just bundled it in with the negative case: x - rnorm(20, 0, 1) y-ifelse(x=0, 10, 5) HTH, Tony Breyal cmga20 wrote: Hi i am very new to R and I have been trying to change each individual piece of data in a data set to 10 if it is below 0 and 5 if it is above 0. I know this sounds very easy but i am struggling!! -- View this message in context: http://www.nabble.com/For-loop-for-distinguishing-negative-numbers-tp24499872p24500973.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation functions for underdispered Poisson and binomial distributions
On Wed, 15 Jul 2009, Shinichi Nakagawa wrote: Dear R users I would like to simulate underdispersed Poisson and binomial distributions somehow. I know you can do this for overdispersed counterparts - using rnbinom() for Poisson and rbetabinom() for binomial. Could anyone share functions to do this? Or please share some tips for modifying existing functions to achieve this. Shinichi, You really need a model for the underdispersion. Using that model, you would calculate the probabiltities fo the binomial or Poisson counts. But you have to come up with something appropriate for your situation. For example, probs - prop.table( dbinom( 0:10, 10, .5)^3 ) or probs - prop.table( dbinom( 0:10, 10, .5) + ifelse( 0:10 == 5, 1, 0) ) will each produce probabilities for counts that are less dispersed than probs - dbinom( 0:10, 10, 0.5 ) but neither may suitably model the counts for the situation in which you are interested. --- Once you have that model in hand sample( 0:10, N, pr=probs, repl=TRUE ) will 'simulate' N such counts. HTH, Chuck Thank you very much for your help and time Shinichi Shinichi Nakagawa, PhD (Lecturer of Behavioural Ecology) Department of Zoology University of Otago 340 Great King Street P. O. Box 56 Dunedin, New Zealand Tel: +64-3-479-5046 Fax: +64-3-479-7584 http://www.otago.ac.nz/zoology/staff/academic/nakagawa.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix multiplication precision
Thanks for the explanation. I am not too deep into linear algebra. T.mat, Rz.mat and Q.mat are square matrices 4 x 4. To be more clear, here is what they look like. h=3.39/2; T.mat-matrix(c(1,0,0,0,0,1,0,0,0,0,1,0,0,0,-h,1), nrow=4) # translation of the system alpha-36*pi/180 cos.alpha-cos(alpha) minus.sin.alpha- -1*sin(alpha) sin.alpha-sin(alpha) Rz.mat-matrix(c(cos.alpha,minus.sin.alpha,0,0,sin.alpha,cos.alpha,0,0,0,0,1,0,0,0,0,1), nrow=4) Rx.mat-matrix(c(1,0,0,0,0,cos.alpha, minus.sin.alpha,0,0,sin.alpha,cos.alpha,0,0,0,0,1), nrow=4) Q.mat is a product of Rz.mat and Rx.mat with different angles. The resultant matrix Mn.mat is used to computed the next coordinates of a helix. center.Of.bases-matrix(c(0.0,0.0,0.0,1),nrow=4) Mn.mat-qr.solve(compute.Mn.mat(T.mat,Rz.mat,Q.mat)) # This function computes the Mn.mat matrix using specific angles. Which is used to get the new coordinates. New.center.Of.bases-as.numeric(Mn.mat %*% center.Of.bases) Does these lines of code tell you anything about the complexity of the problem? Let me know if I should do anything different. I was really glad hear from you since you are an expert in the area. Cheers../Murli -Original Message- From: dmba...@gmail.com [mailto:dmba...@gmail.com] On Behalf Of Douglas Bates Sent: Wednesday, July 15, 2009 7:29 AM To: Nair, Murlidharan T Cc: r-help@r-project.org Subject: Re: [R] Matrix multiplication precision On Wed, Jul 15, 2009 at 3:42 AM, Nair, Murlidharan Tmn...@iusb.edu wrote: Hi!! I am trying to multiply 5 matrices and then using the inverse of that matrix for further computation. I read about precision problems from the archives and the suggestion was to use as.numeric while computing the products. Hmm. I'm not sure what the origins of that advice might be but I don't think it would apply in general. I am still having problems with the results. Here is how I am using it #Mn.mat-(T.mat %*% Rz.mat %*% Q.mat %*% Rz.mat %*% T.mat) # I was doing this in one step which I have broken down into multiple steps as below. Mn1.mat-matrix(as.numeric(T.mat %*% Rz.mat),nrow=4) Mn2.mat-matrix(as.numeric(Mn1.mat %*% Q.mat),nrow=4) Mn3.mat-matrix(as.numeric(Mn2.mat %*% Rz.mat),nrow=4) Mn.mat-matrix(as.numeric( Mn3.mat %*% T.mat),nrow=4) I doubt very much that the as.numeric would have any effect on precision in these cases. You are simply taking a numeric result in its matrix form, converting it to a vector then converting the vector back to a matrix., mm - matrix(round(rnorm(8), 3), nrow = 4) mm [,1] [,2] [1,] -0.110 2.195 [2,] 0.616 0.353 [3,] 0.589 0.970 [4,] 1.028 0.857 as.numeric(mm) [1] -0.110 0.616 0.589 1.028 2.195 0.353 0.970 0.857 The key in situations like this is to analyze the structure of the computation and decide whether you can group the intermediate results. You have written your product as T %*% Rz %*% Q %*% Rz %*% T What are the characteristics of T, Rz, and Q? Are they square and symmetric, square and triangular, diagonal? Is Q orthogonal (the factors in an orthogonal - triangular decomposition are often written as Q and R)? Did you happen to skip a few transposes in that formula - often formulas like that generate symmetric matrices. And the big lesson, of course, is the first rule of numerical linear algebra., Just because a formula is written in terms of the inverse of a matrix doesn't mean that is a good way to calculate the result; in fact, it is almost inevitably the worst way of calculating the result. You only calculate the inverse of a matrix after you have investigated all other possible avenues for calculating the result and found them to be fruitless. You haven't told us what you plan to do with the inverse and that is an important consideration., If, for example, it represents a variance-covariance matrix, and you want to express the result as standard deviations and correlations you would not compute the variance-covariance matrix but stop instead at the Cholesky factor of the inverse. You may want to check the facilities of the Matrix package for expressing your calculation. It is a recommended package that is included with binary versions of R from 2.9.0 and has many ways of expressing and exploiting structure in matrices. For getting the inverse I am doing the following Mn.mat.inv-qr.solve(Mn.mat) Will I run into precision problems when I do the above? Thanks ../Murli __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
Re: [R] (newbie) sum for certain number of rows
Thanks. I tried it and it worked wonderful. Wishing for the DAY' to come. Life needs to be 'reset'. --- On Wed, 7/15/09, Dimitris Rizopoulos d.rizopou...@erasmusmc.nl wrote: From: Dimitris Rizopoulos d.rizopou...@erasmusmc.nl Subject: Re: [R] (newbie) sum for certain number of rows To: kelvin lau kelvin...@yahoo.com Cc: r-help@r-project.org Date: Wednesday, July 15, 2009, 6:25 PM one way is the following: dat - read.table(textConnection( 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 )) closeAllConnections() k - 3 ind - rep(seq(1, nrow(dat)/k), each = k) rowsum(dat, ind) I hope it helps. Best, Dimitris kelvin lau wrote: I have following data in a data.csv file separated by space 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 etc... I wish to calculate the sum of each column for certain number of rows. For example if I want sum of the data after each 3 rows, it should display 1 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 2 3 2 3 3 2 3 2 So far, this is what I have done xx-read.table(data.csv,header=FALSE) ss-t(apply(xx,2,sum)) # which displayed the sum of all rows I tried my best to look for solution on the Internet but so far haven't managed to find it. I am extremely grateful if someone can point me how to go about it. Thanks. Kelvin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.delim skips first column (why?)
This should work: junk -read.table(fd0edfab.txt, sep=, header=T, fill=F,quote= ) HIH Paolo Sonego __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing additional arguments through '...'
Hi, I know this is a simple question, but I've been having problems passing additional arguments through '...'. It is not matching the arguments correctly if the permanent argument of the function begins with the same letter as the additional argument. The following example will help show what I mean: fun.tester - function(abc,...){ + print(abc) + } But if I input: fun.tester(0,a=1) It returns the value '1' for abc. It does however, work properly if I input: fun.tester(abc=0,a=1) It seems like a simple problem, so I would assume I'm doing something stupid, but I haven't been able to find a solution anywhere. Thanks! -- View this message in context: http://www.nabble.com/Passing-additional-arguments-through-%27...%27-tp24501159p24501159.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: problems in resampling time series, block length, trend.test
Hi, I have a time series (say x) of 13 years showing an evident increase. I want to exclude two observations (the fourth and 10th), so I do: trend.test(x[-c(4,10)]) where: x[-c(4,10)] [1]7 37 79 72 197 385 636 705 700 1500 1900 and I get: Spearman's rank correlation rho data: x[-c(4, 10)] and time(x[-c(4, 10)]) S = 4, p-value 2.2e-16 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.9818182 Very strong positive correlation! the x is increasing Now, I would like to resample this time series because I have others time series where the trend is more uncertain and the sample size is still small. I read Resampling Methods in R: The boot Package (ISSN 1609-3631) and I believe that a way of doing it is by a block bootstrap that allows to deal with the autocorrelation (I know that some of my time series show autocorrelation, lag=1). I used trend.test function (library=pastecs) and I did: trend.x=trend.test(x[-c(4,10)],R=999) trend.x trend.x$p.value plot(trend.x) And I get: trend.x=trend.test(x[-c(4,10)],R=999) trend.x BLOCK BOOTSTRAP FOR TIME SERIES Fixed Block Length of 1 Call: tsboot(tseries = x, statistic = test.trend, R = R, l = 1, sim = fixed) Bootstrap Statistics : original biasstd. error t1* 0.9818182 -0.9844001 0.3272114 trend.x$p.value [1] 0 I suppose that the problem arises from the length of the block (1) and in this way I get a rho=0, (nevertheless I don't understand how it is significant). I would like to change the block length but I am not able to (I tried in several different ways but unsuccessfully). So, two questions: 1) How can I change the block length? 2) In terms of block length and type of simulation (sim=fixed or geom), what is the best way of doing it? Thanks in advance for any suggestion, Best wishes Simone ¿Quieres conocerte mejor? ¡Conoce lo que Windows Live tiene especialmente para ti! _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in package.skeleton, R 2.9.0?
Daniel Klevebring wrote: Bump Anyone? Thanks Daniel On 13 jul 2009, at 10.57, Daniel Klevebring wrote: Dear all, I am using package.skeleton to build a small packages of misc function for personal use. I have recently discovered that the option force=TRUE doesn't seem to do what is meant to do. Here's what I'm doing: setwd(/Users/danielk/Documents/R/packages/dk) files - paste(codebase, dir(codebase, pattern=.R), sep=/) package.skeleton(name=dk, force=TRUE, code_files=files) Creating directories ... Creating DESCRIPTION ... Creating Read-and-delete-me ... Copying code files ... Making help files ... Done. Further steps are described in './dk/Read-and-delete-me'. Now, everything seems fine, but changes to files in me codebase folder, doesn't come along if the folder dk/R already contains the files, even though I use force=TRUE. If I remove the dk/R folder or the dk folder altogether, the changes come along so to me it seems that it's the overwrite part that doesn't work as it should - or am I doing something wrong here? To me, it seems that the function safe.dir.create (which is defined in package.skeleton never overwrites folders, yielding force=TRUE useless. from the help on package.skeleton force: If 'FALSE' will not overwrite an existing directory. which could be clearer in that package.skeleton will stop with an error if the top level directory, i.e. 'dk' in your case, exists and you had specified force=FALSE. But it will not overwrite the existing directory 'dk' with force=TRUE as you observed. Looking at the code it is clear that file.copy is used to copy the code files and that has per default overwrite=FALSE so your existing code files will not be overwritten. With the current implementation package.skeleton will not do what you intended. There are many ways to workaround e.g. fn - dir(codebase, pattern=.R, full.names=TRUE) file.remove( list.files(path=file.path(dk, R), pattern=\\.R, full.names=TRUE)) package.skeleton(name=dk, force=TRUE, code_files=fn) Matthias See below for sessionInfo. Thanks a bunch Daniel sessionInfo() R version 2.9.0 (2009-04-17) i386-apple-darwin8.11.1 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.9.0 -- Contact information: Daniel Klevebring M. Sc. Eng., Ph.D. Student Dept of Gene Technology Royal Institute of Technology, KTH SE-106 91 Stockholm, Sweden Visiting address: Roslagstullsbacken 21, B3 Delivery address: Roslagsvägen 30B, 104 06, Stockholm Invoice address: KTH Fakturaserice, Ref DAKL KTHBIO, Box 24075, SE-10450 Stockholm E-mail: dan...@biotech.kth.se Phone: +46 8 5537 8337 (Office) Phone: +46 704 71 65 91 (Mobile) Web: http://www.biotech.kth.se/genetech/index.html Fax: +46 8 5537 8481 -- Contact information: Daniel Klevebring M. Sc. Eng., Ph.D. Student Dept of Gene Technology Royal Institute of Technology, KTH SE-106 91 Stockholm, Sweden Visiting address: Roslagstullsbacken 21, B3 Delivery address: Roslagsvägen 30B, 104 06, Stockholm Invoice address: KTH Fakturaserice, Ref DAKL KTHBIO, Box 24075, SE-10450 Stockholm E-mail: dan...@biotech.kth.se Phone: +46 8 5537 8337 (Office) Phone: +46 704 71 65 91 (Mobile) Web: http://www.biotech.kth.se/genetech/index.html Fax: +46 8 5537 8481 MSN messenger: klevebr...@msn.com [[alternative HTML version deleted]] ATT1.txt -- Contact information: Daniel Klevebring M. Sc. Eng., Ph.D. Student Dept of Gene Technology Royal Institute of Technology, KTH SE-106 91 Stockholm, Sweden Visiting address: Roslagstullsbacken 21, B3 Delivery address: Roslagsvägen 30B, 104 06, Stockholm Invoice address: KTH Fakturaserice, Ref DAKL KTHBIO, Box 24075, SE-10450 Stockholm E-mail: dan...@biotech.kth.se Phone: +46 8 5537 8337 (Office) Phone: +46 704 71 65 91 (Mobile) Web: http://www.biotech.kth.se/genetech/index.html Fax: +46 8 5537 8481 MSN messenger: klevebr...@msn.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Matthias Burger Project Manager/ Biostatistician Epigenomics AGKleine Praesidentenstr. 110178 Berlin, Germany phone:+49-30-24345-0fax:+49-30-24345-555 http://www.epigenomics.com matthias.bur...@epigenomics.com -- Epigenomics AG Berlin Amtsgericht Charlottenburg HRB 75861 Vorstand: Geert Nygaard (CEO/Vorsitzender) Oliver Schacht
Re: [R] Passing additional arguments through '...'
Please consult the R Language Definition for a detailed explantion, but... In brief, the evaluator first tries to match formal arguments by name, first exactly, then partially, before matching by position, so a partially matches formal argument abc. e.g. contrast fun.tester(0,b=1) ## b does not partially match abc [1] 0 Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of escher2079 Sent: Wednesday, July 15, 2009 9:06 AM To: r-help@r-project.org Subject: [R] Passing additional arguments through '...' Hi, I know this is a simple question, but I've been having problems passing additional arguments through '...'. It is not matching the arguments correctly if the permanent argument of the function begins with the same letter as the additional argument. The following example will help show what I mean: fun.tester - function(abc,...){ + print(abc) + } But if I input: fun.tester(0,a=1) It returns the value '1' for abc. It does however, work properly if I input: fun.tester(abc=0,a=1) It seems like a simple problem, so I would assume I'm doing something stupid, but I haven't been able to find a solution anywhere. Thanks! -- View this message in context: http://www.nabble.com/Passing-additional-arguments-through-%27...%27-tp24501 159p24501159.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to use EGARCH and GJR in R?
Hello, On 7/15/09, Andriy Fetsun fet...@googlemail.com wrote: Could you please help me with EGARCH and GJR? Do you mean JGR [1]? If so, install it and try running your code in it. Unless you bump into very specific issues, it should work. Liviu [1] http://cran.r-project.org/web/packages/JGR/index.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change data frame column names
Thanks all, I used Gavins approach - unlisting the titles and the replacing names, as my titles were stored as factors in the data frame as that was the way they were imported... Tom Subject: Re: [R] Change data frame column names From: gavin.simp...@ucl.ac.uk To: tomlipt...@hotmail.com CC: r-help@r-project.org Date: Wed, 15 Jul 2009 16:12:32 +0100 On Wed, 2009-07-15 at 14:35 +, Tom Liptrot wrote: Hi R helpers, I have a data frame and I want to change the column names to names I have held in another data frame, but I am having difficulty. All data framnes are large so i can't input manually. Below is what i have tried so far: df-data.frame(a=1:5, b=2:6, d=3:7, e=4:8) coltitles-data.frame(v1=col number one, v2=col number two, v3=col number three, v4=col number four) ##first attempt names(df)-coltitles names(df) [1] 1 1 1 1 ###not what i wanted as I want names(df) to return [1] col number one col number two col number three col number four ##second attempt coltitles-as.vector(coltitles, mode=character) ##trying to convert to a character vector after looking at help is.vector(coltitles) [1] TRUE names(df)-coltitles names(df) [1] 1 1 1 1 ###again not what I wanted How can I convert the column names? This seems a bit of a strange way to go about things, but if needs must: names(df) - unlist(coltitles) df col number one col number two col number three col number four 1 1 23 4 2 2 34 5 3 3 45 6 4 4 56 7 5 5 67 8 names(df) [1] col number one col number two col number three col number four If your data frame with the names in the one (and only) row (why store this as a one row df and not a character vector??? [1]) then by unlisting it we get a (or something that can be coerced to) a character vector of the correct names. If coltitles contains more rows, then: names(df) - unlist(coltitles[1,]) might be more appropriate. HTH G [1] Your df storage of names is s inefficient (for this task): object.size(names(df)) # after running the above code 312 bytes object.size(coltitles) 2728 bytes Thanks in advance, Tom Beyond Hotmail - see what else you can do with Windows Live. Find out more. _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] negative Somers D from Design package
For the Cox model Dxy is the rank correlation between predicted log hazard and time to event. As a high hazard means that the time to event is short, you need to negate Dxy for your purpose. Frank North, Bernard V wrote: Dear R help My problem is very similar to the analysis detailed here. If we use the mayo dataset provided with the survivalROC package the estimate for Somer's Dxy is very negative -0.56. The Nagelkerke R2 is positive though 0.32. I know there is a difference between explained variation and predictive ability but I am surprised there is usch a difference given that even a non predictive model should have Dxy around 0. Am I doing something wrong or is there an interpretation that makes sense ? This is with the mayo data so its reproducible but the result with my data is very similar. Many thanks in advance library(survivalROC) library(Design) library(survival) data(mayo) Sm - Surv(mayo$time,mayo$censor) fm - cph( Sm ~ mayoscore4,mayo,x=T,y=T,surv=T ) validate(fm, B=150,dxy=T) Iteration 1 index.orig training test optimism index.corrected n Dxy -0.566027923 -0.55407 -0.566027923 -0.0006374833-0.565390440 150 R2 0.325860603 0.327350885 0.325860603 0.0014902826 0.324370320 150 Slope 1.0 1.0 0.987854765 0.0121452354 0.987854765 150 D 0.093398440 0.095166239 0.093398440 0.0017677983 0.091630642 150 U -0.001562582 -0.001579618 0.001150175 -0.0027297932 0.001167211 150 Q 0.094961022 0.096745857 0.092248266 0.0044975915 0.090463431 150 Dr Bernard North Statistical Consultant Statistical Advisory Service Advice and Courses on Research Design and Methodology Imperial College South Kensington Campus Room 845, 4th Floor 8 Princes Gardens London SW7 1NA. Tel: 020 7594 2034 Fax: 020 7594 1489 Email: bno...@imperial.ac.uk Web: www.ic.ac.uk/stathelp [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing additional arguments through '...'
On Wed, 15 Jul 2009, escher2079 wrote: Hi, I know this is a simple question, but I've been having problems passing additional arguments through '...'. It is not matching the arguments correctly if the permanent argument of the function begins with the same letter as the additional argument. The following example will help show what I mean: fun.tester - function(abc,...){ + print(abc) + } But if I input: fun.tester(0,a=1) It returns the value '1' for abc. It does however, work properly if I input: fun.tester(abc=0,a=1) I think you'll need to dig into sys.call() and match.call() and put together your own matching scheme to force a function to first match by position and then match all else by name. If match.call() is unfamiliar to you, it is advised to read the first 10 lines of lm(). HTH, Chuck p.s. every argument that comes AFTER '...' in the formals must match exactly. Perhaps this would help you. It seems like a simple problem, so I would assume I'm doing something stupid, but I haven't been able to find a solution anywhere. Thanks! -- View this message in context: http://www.nabble.com/Passing-additional-arguments-through-%27...%27-tp24501159p24501159.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] spreadsheet-style autoupdating
Dear all, I am curious whether one can automatically update, say, a column in R, similar to how spreadsheets (Excel, etc.) do? In my specific case I have two columns with logical data, which individually convey different information, although ultimately for a common purpose (flagging for removal). I would thus need a third logical variable that would take TRUE if any of the prev. columns take value TRUE for the given line. Without using RExcel, would there be a way of automatising this (by manual I understand running some code to recreate the third variable prior to printing it)? To R purists: I am not sure that this would amount to good programming habits. Also, this way of thinking comes from my infancy with Excel; should there be a better approach, please let me know. Thank you Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with averaging
Hi I am using the following script to average a set of data 0f 62 columns into 31 colums. The data consists of values of ln(0.01) or -4.60517 instead of NA's. These need to be averaged for each row (i.e 2 values being averaged). What I would I need to change for me to meet the conditions: 1. If each run of the sample has a value, the average is given 2. If only one run of the sample has a value, that value is given (i.e. no averaging) 3. If both runs have missing values, NA is given. I have tried changing (na.strings = NA) to (na.strings = -4.60517) but this causes all the pairs with even one NA return an NA value. I would prefer not to use for loops as this would slow the script down considerably. #SCRIPT STARTS rm(list=ls()) setwd(C:/Documents and Settings/Amit Patel) #na.strings makes na's readable zz - read.csv(Pr9549_LabelFreeData_ByExperimentAK.csv,strip.white = TRUE, na.strings = NA) ix - seq(from=2,to=62, by=2) averagedResults - (zz[,ix] + zz[,ix+1])/2 averagedResults - cbind(zz[,1],averagedResults ) colnames(averagedResults) - c(PCI,G1-C1,G1-C2,G1-C3,G1-C4,G1-C5,G1-C6,G1-C7,G1-C8, G2-C9,G2-C10,G2-C11,G2-C12,G2-C13,G2-C14,G2-C15,G2-C16, G3-C17,G3-C18,G3-C19,G3-C20,G3-C21,G3-C22,G3-C23, G4-C24,G4-C25,G4-C26,G4-C27,G4-C28,G4-C29,G4-C30,G4-C31) write.csv(averagedResults, file = Pr9549_averagedreplicates.csv, row.names = FALSE) #SCRIPT ENDS __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] searching for elements
Hello! For the past three years, I have been using R extensively in my PhD program in Finance for statistical work. Normally, I can figure this kind of thing out on my own, but I am completely stumped as to why the following code does not work. I have two variables: sigs and p0_recent. dim(sigs) = 296 3 dim(p0_recent) = 504 7 I want to compare each element in the first column of sigs with the elements in the first column of p0_recent. In other words, does sigs[i,1] == p0_recent[j,1], where i = 1:dim(sigs) [1] and j = 1:dim(p0_recent)[1]. I've been trying: for (j in 1:dim(p0_recent)[1]) { + for (i in 1:dim(sigs)[1]) { + if (sigs[i,1] == p0_recent[j,1]) { + print(sigs[i,1]) + }}} But, I get: Error in Ops.factor(sigs[i, 1], p0_recent[j, 1]) : level sets of factors are different Is there a better way than for loops to compare each element in one column to each element in another column of different length? If not, can anyone suggest a solution? CWE __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Differing Variable Length Inconsistencies in Random Effects/Regression Models
Dear All, I am quite new to R and am having a problem trying to run a linear model with random effects/ a regression- with particular regard to my variable lengths being different and the models refusing to compute any further. The codes I have been using are as follows: vc-read.table(P:\\R\\Testvcomp10.txt,header=T) attach(vc) family-factor(family) colms-(vc)[,4:13] ## this to assign the 10 columns containing marker datato a new variable, as column names are themselves not in any recognisable sequence vcdf-data.frame(family,peg.no,ec.length,syll.length,colms) library(lme4) for (c in levels(family)) + {for (i in 1:length(colms)) +{ fit-lmer(peg.no~1 + (1|c/i), vcdf) +} +summ-summary(fit) +av-anova(fit) +print(summ) +print(av) + } This gives me: Error in model.frame.default(data = vcdf, formula = peg.no ~ 1 + (1 + : variable lengths differ (found for 'c') I had posted a similar message on the R mixed model list a few days ago, with respect to my fundamental methods, and Jerome Goudet had kindly referred me to an alternative approach using residuals obtained from a random effects model in lmer(), and then doing regressions using those [residuals being the dependent variable and my marker data columns the independent variable]. The code for that is as follows: vc-read.table(P:\\R\\Text Files\\Testvcomp10.txt,header=T,sep=,dec=.,na.strings=NA,strip.white=T) attach(vc) family-factor(family) colms-(vc)[,4:13] names(vc) [1] male.parent family offspring.id P1L55P1L73 [6] P1L74P1L77P1L91P1L96P1L98 [11] P1L100 P1L114 P1L118 peg.no ec.length [16] syll.length vcdf-data.frame(family, colms, peg.no, ec.length, syll.length) library(lme4) famfit-lmer(peg.no~1 + (1|family), na.action=na.omit, vcdf) resfam-residuals(famfit) for( i in 1:length(colms)) + { + print (Marker, i) + regfam-abline(lm(resfam~i)) + print(regfam) + } This again gives me the error: [1] Marker Error in model.frame.default(formula = resfam ~ i, drop.unused.levels = TRUE) : variable lengths differ (found for 'i') My variables all have missing values somewhere or the other. The missing values are not consistent for all individuals, i.e some individuals have some values missing, others have others, and as much as I have tried to use na.action=na.omit and na.rm=T, the differing variable length problem is dogging me persistently.. I also tried to isolate the residuals, save them in a new variable (called 'resfam' here), and tried to save that in the data.frame()-vcdf, that I had created earlier. The problem with that was that when the residuals were computed, lmer() ignored missing data in 'peg.no' with respect to 'family', which is obviously not the same data missing for say another variable E.g. 'ec.length'- leading again to an inconsistency in variable lengths. Data.frame would then not accept that addition at all to the previous set. This was fairly obvious right from the start, but I decided to try it anyway. Didn't work. I apologise if the solution to working with these different variable lengths is obvious and I don't know it- but I don't know R that well at all. My data files can be downloaded at the following location: http://www.filesanywhere.com/fs/v.aspx?v=896d6b88616173be71ab (excel- .xlsx) http://www.filesanywhere.com/fs/v.aspx?v=896d6b88616174a76e9e (.txt file) Any pointers would be greatly appreciated, as this is holding me up loads. Thanks a ton for your help, Aditi -- A Singh aditi.si...@bristol.ac.uk School of Biological Sciences University of Bristol __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] searching for elements
Hi, On Jul 15, 2009, at 1:00 PM, Chyden Finance wrote: Hello! For the past three years, I have been using R extensively in my PhD program in Finance for statistical work. Normally, I can figure this kind of thing out on my own, but I am completely stumped as to why the following code does not work. I have two variables: sigs and p0_recent. dim(sigs) = 296 3 dim(p0_recent) = 504 7 I want to compare each element in the first column of sigs with the elements in the first column of p0_recent. In other words, does sigs[i,1] == p0_recent[j,1], where i = 1:dim(sigs)[1] and j = 1:dim(p0_recent)[1]. I've been trying: for (j in 1:dim(p0_recent)[1]) { + for (i in 1:dim(sigs)[1]) { + if (sigs[i,1] == p0_recent[j,1]) { + print(sigs[i,1]) + }}} But, I get: Error in Ops.factor(sigs[i, 1], p0_recent[j, 1]) : level sets of factors are different It seems that this particular problem is due to the fact that you are comparing two sets of factors with different levels, which is what the Ops.factor error is saying. But let's make it more clear: R f1 - factor(c('a','b','c')) R f1 [1] a b c f2 - factor(c('c','d','e')) [1] c d e Levels: c d e f1[3] == f2[1] Error in Ops.factor(f1[3], f2[1]) : level sets of factors are different Is there a better way than for loops to compare each element in one column to each element in another column of different length? If not, can anyone suggest a solution? Probably better ways, but how about starting by first nuking the inner loop? Something like this should work: for (j in 1:nrow(p0_recent)) { found - match(p0_recent[j,1], sigs[,1]) ... } In each iteration found will be NA if the p0_recent element does not appear in sigs[,1], otherwise it will be a vector of indices into sigs[,1] that equal the p0_recent element you're querying with. (Assuming you fix your factor/level issue) HTH, -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spaces in a name
Not sure how to get formula to handle spaces, but to get going you can just change the names in the data frame after you have read it, e.g. names(mcReg) - make.names(names(mcReg)) will (usually) give you Small.Bank.Acquired. Allan On 15/07/09 15:40, Idgarad wrote: I am reading regressors from an excel file (I have no control over the file) and some of the element names have spaces: i.e. Small Bank Aquired but I have found that lm(SourceData ~ . - Small Bank Aquired, mcReg) doesn't work (mcReg = modelCurrentRegressors) As they are toggles I have ran them through factor() to be treated propertly as 0 or 1 but due to the fact I am grabbing automagically the first 2/3rds of the data some of the regressors are either all 0s or all 1s accordingly so I need to take them out of the model by hand for now until I find a nice automatic method for removing regressors that only have 1 factor. So Primarily: how do I handle names that include spaces in this context and as a bonus: Anyone have a nice method for yanking regressors that only have a single factor in them from the lm() function? e.g. (for the following 30 elements) 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 1,1,1,1,1,1,1,1,1,1 As you can see grabbing the first 2/3rds is all 0s and the last 1/3rd is all ones (doing in-sample forecast diagnostic building the model only on the first 2/3rds of data, then forecasting the next 1/3rd and comparing.) Sorry if I am rambling a bit, still on cup of coffee #1... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] searching for elements
One more thing, Error in Ops.factor(sigs[i, 1], p0_recent[j, 1]) : level sets of factors are different It seems that this particular problem is due to the fact that you are comparing two sets of factors with different levels, which is what the Ops.factor error is saying. But let's make it more clear: R f1 - factor(c('a','b','c')) R f1 [1] a b c f2 - factor(c('c','d','e')) [1] c d e Levels: c d e f1[3] == f2[1] Error in Ops.factor(f1[3], f2[1]) : level sets of factors are different One way to fix would be to ensure all factors have the same levels from the get go, like: It seems that your first problem you're facing looks like you are comparing two sets of factors with different levels, which is what the Ops.factor error is saying. But let's make it more clear: R f1 - factor(c('a','b','c'), levels=c('a','b','c','d','e')) R f2 - factor(c('c','d','e'), levels=c('a','b','c','d','e')) R f1 [1] a b c Levels: a b c d e R f2 [1] c d e Levels: a b c d e R f1[3] == f2[1] [1] TRUE -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC results from stored procedure
Short of uploading a SQL server database, I don't think I can make this example reproducible, but I hope it's not so complicated as to require reproducibility I'm using RODBC to get data from Microsoft SQL Server. I can call a parametrized stored procedure without a problem and the proc does indeed execute successfully. However, even though the proc returns the results I found that I had to modify the proc so that, in addition to returning the results to the caller, it also saved the results to an actual SQL Server table. Then I was able to make second call to sqlQuery with a simple select statement for the results table and retrieve the results back into R. My question is: can I get stored proc results directly back to R without having to populate and query a results table? dbdata-sqlQuery(conn,sp_GetReturns,'2009-07-10','2009-07-14') head(dbdata) character(0) [no data here] dbdata-sqlQuery(conn,select * from returns order by Date asc) [...success...] Date Asset1 Asset2 2009-07-10 0.010.02 2009-07-13 0.007 -0.003 ... I'd appreciate any suggestions, Andrew W rm(dbdata) #run query in query analyzer and get date from rtnsUnderscored table #dbdata-sqlFetch(conn,sqtable=sp_GetModelRtns4OptimizeRangeVer4 5000,'2001-10-01','2009-07-14) #dbdata-sqlFetch(conn,sqtable=sp_GetModelRtns4OptimizeRangeVer4 5000,'2009-07-01','2009-07-14) dbdata-sqlQuery(conn,select * from rtnsUnderscored order by Date asc) -- View this message in context: http://www.nabble.com/RODBC-results-from-stored-procedure-tp24503096p24503096.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] duplicate data points on a line graph
Hi, I am new to R plot. I am trying to increase the data point observation when duplicate data points exist x y 1 10 1 10 2 3 4 5 9 8 in the about example 1, 10 would be displayed larger than the other data points. Could someone give me some assistance with this problem 757-864-7114 LARC/J.L.Shipman/jshipman jeffery.l.ship...@nasa.gov __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] duplicate data points on a line graph
LARC/J.L.Shipman/jshipman wrote: I am new to R plot. I am trying to increase the data point observation when duplicate data points exist x y 1 10 1 10 2 3 4 5 9 8 in the about example 1, 10 would be displayed larger than the other data points. Could someone give me some assistance with this problem Not exactly what you want, but ?sunflowerplot might come close. Dieter -- View this message in context: http://www.nabble.com/duplicate-data-points-on-a-line-graph-tp24503299p24503357.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] duplicate data points on a line graph
On 7/15/2009 2:19 PM, NDC/jshipman wrote: Hi, I am new to R plot. I am trying to increase the data point observation when duplicate data points exist xy 110 110 23 45 9 8 in the about example 1, 10 would be displayed larger than the other data points. Could someone give me some assistance with this problem A couple of simple approaches: x - c(1,1,2,4,9) y - c(10,10,3,5,8) plot(jitter(x), jitter(y)) plot(x, y, cex=c(2,2,1,1,1)) 757-864-7114 LARC/J.L.Shipman/jshipman jeffery.l.ship...@nasa.gov __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spaces in a name
Try this with built in BOD: names(BOD) - paste(names(BOD), X) BOD Time X demand X 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0 5 5 15.6 6 7 19.8 lm(`demand X` ~ `Time X`, BOD) Call: lm(formula = `demand X` ~ `Time X`, data = BOD) Coefficients: (Intercept) `Time X` 8.5211.721 On Wed, Jul 15, 2009 at 10:40 AM, Idgaradidga...@gmail.com wrote: I am reading regressors from an excel file (I have no control over the file) and some of the element names have spaces: i.e. Small Bank Aquired but I have found that lm(SourceData ~ . - Small Bank Aquired, mcReg) doesn't work (mcReg = modelCurrentRegressors) As they are toggles I have ran them through factor() to be treated propertly as 0 or 1 but due to the fact I am grabbing automagically the first 2/3rds of the data some of the regressors are either all 0s or all 1s accordingly so I need to take them out of the model by hand for now until I find a nice automatic method for removing regressors that only have 1 factor. So Primarily: how do I handle names that include spaces in this context and as a bonus: Anyone have a nice method for yanking regressors that only have a single factor in them from the lm() function? e.g. (for the following 30 elements) 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 1,1,1,1,1,1,1,1,1,1 As you can see grabbing the first 2/3rds is all 0s and the last 1/3rd is all ones (doing in-sample forecast diagnostic building the model only on the first 2/3rds of data, then forecasting the next 1/3rd and comparing.) Sorry if I am rambling a bit, still on cup of coffee #1... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] duplicate data points on a line graph
Alternatively, you could make use of transparency (on some devices), or use ggplot2 to map the number of observations to the point size, d = read.table(textConnection( x y 1 10 1 10 2 3 4 5 9 8 ),head=T) library(ggplot2) # transparency qplot(x, y, data=d, alpha=I(0.5)) d2 = unique(ddply(d,.(x,y), transform, count=length(x))) # mapping number of obs. qplot(x, y, data=d2,size=count) HTH, baptiste 2009/7/15 Chuck Cleland cclel...@optonline.net On 7/15/2009 2:19 PM, NDC/jshipman wrote: Hi, I am new to R plot. I am trying to increase the data point observation when duplicate data points exist xy 110 110 23 45 9 8 in the about example 1, 10 would be displayed larger than the other data points. Could someone give me some assistance with this problem A couple of simple approaches: x - c(1,1,2,4,9) y - c(10,10,3,5,8) plot(jitter(x), jitter(y)) plot(x, y, cex=c(2,2,1,1,1)) 757-864-7114 LARC/J.L.Shipman/jshipman jeffery.l.ship...@nasa.gov __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC results from stored procedure
tradenet wrote: Short of uploading a SQL server database, I don't think I can make this example reproducible, but I hope it's not so complicated as to require reproducibility I can call a parametrized stored procedure without a problem and the proc does indeed execute successfully. This works for me with the popular Northwind database channel = odbcConnect(northwind) # Assume this is configured correctly sqlQuery(channel,EXEC CustOrderHist @CustomerID=ALFKI) Try with a non-date query first, the switch to the tricky date format in the parameter. Dieter -- View this message in context: http://www.nabble.com/RODBC-results-from-stored-procedure-tp24503096p24503854.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GLM Gamma Family logLik formula?
Hello all, I was wondering if someone can enlighten me as to the difference between the logLik in R vis-a-vis Stata for a GLM model with the gamma family. Stata calculates the loglikelihood of the model as (in R notation) some equivalent function of -1/scale * sum(Y/mu+log(mu)+(scale-1)*log(Y)+log(scale)+scale*lgamma(1/scale)) where scale (or dispersion) = 1, Y = the response variable, and mu is the fitted values given by the fitted model. R seems to use a very similar approach, but scale is set equal to the calculated dispersion for the gamma model. However, when I calculate the logLik by hand this way the answer differs slightly (about .5) from the logLik(glm.m1). I haven't been able to figure out why looking at the help. If anyone has any ideas, the insight would be much appreciated. Cheers, Skipper __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] strategy to iterate over repeated measures/longitudinal data
Hi Group, Create some example data. set.seed(1) wide_data - data.frame( id=c(1:10), predictor1 = sample(c(a,b),10,replace=TRUE), predictor2 = sample(c(a,b),10,replace=TRUE), predictor3 = sample(c(a,b),10,replace=TRUE), measurement1=rnorm(10), measurement2=rnorm(10)) head(wide_data) id predictor1 predictor2 predictor3 measurement1 measurement2 1 1 a a b -0.04493361 -0.05612874 2 2 a a a -0.01619026 -0.15579551 3 3 b b b 0.94383621 -1.47075238 4 4 b a a 0.82122120 -0.47815006 5 5 a b a 0.59390132 0.41794156 6 6 b a a 0.91897737 1.35867955 The measurements are repeated measures, and I am looking at one predictor at a time. In the actual problem, there are around 400,000 predictors (again, one at a time). Now, I want to use multiple measurements (the responses) to run a regression of measurements on a predictor. So I will convert this data from wide to long format. I want to iterate through each predictor. So one (inefficient) way is shown below. For each predictor: 1. create a long data set using the predictor and all measurements (using make.univ function from multilevel package) 2. run model, extract the coefficient of interest 3. go to next predictor The end result is a vector of 400,000 coefficients. I'm sure this can be improved upon. I will be running this on a unix cluster with 16G. In the wide format, there are 2000 rows (individuals). With 4 repeated measures, it seems converting everything up front could be problematic. Also, I'm not sure how to iterate through that (maybe putting it in a list). Any suggestions? Thanks for your help. Juliet Here is the inefficient, working code. library(multilevel) library(lme4) #Same data as above set.seed(1) wide_data - data.frame( id=c(1:10), predictor1 = sample(c(a,b),10,replace=TRUE), predictor2 = sample(c(a,b),10,replace=TRUE), predictor3 = sample(c(a,b),10,replace=TRUE), measurement1=rnorm(10), measurement2=rnorm(10)) #vector of names to iterate over predictor_names - colnames(wide_data)[2:4] #vector to store coefficients mycoefs - rep(-1,length(predictor_names)) names(mycoefs) - predictor_names for (predictor in predictor_names) { long_data - make.univ( data.frame(wide_data$id,wide_data[,predictor]), data.frame( wide_data$measurement1, wide_data$measurement2 ) ) names(long_data) - c('id', 'predictor', 'time','measurement') myfit - lmer(measurement ~ predictor + (1|id),data=long_data) mycoefs[predictor] - my...@fixef[2] } mycoefs __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] storing lm() results and other objects in a list
to clean up some code I would like to make a list of arbitrary length to store various objects for use in a loop sample code: BEGIN SAMPLE ## # You can see the need for a loop already linearModel1=lm(modelSource ~ .,mcReg) linearModel2=step(linearModel1) linearModel3=lm(modelSource ~ .-1,mcReg) linearModel4=step(linearModel3) #custom linearModel5=lm(modelSource ~ . -ACF-MonthlyST1-MonthlyST2-MonthlyBLA,mcReg) LinearModel1.res - residuals(linearModel1) LinearModel2.res - residuals(linearModel2) LinearModel3.res - residuals(linearModel3) LinearModel4.res - residuals(linearModel4) LinearModel5.res - residuals(linearModel5) #hmmm bolt on linearModel[x] as linearModel[x]$arma.fit? arma1.fit - auto.arima(LinearModel1.res) arma2.fit - auto.arima(LinearModel2.res) arma3.fit - auto.arima(LinearModel3.res) arma4.fit - auto.arima(LinearModel4.res) arma5.fit - auto.arima(LinearModel5.res,stepwise=T,trace=T) #Ok what is left over after Regression and ARIMA that cannot #be explained. Stupid outliers #AO's can be added to the cReg as a normal dummy variable # but these are AOs from the model not the original data. # is it better to handle AOs from the original data? #linearModel[x]arma.ao? arma1.ao - detectAO(arma1.fit) arma2.ao - detectAO(arma2.fit) arma3.ao - detectAO(arma3.fit) arma4.ao - detectAO(arma4.fit) arma5.ao - detectAO(arma5.fit) #What do I do with an innovative outlier? Transfer function or what? #auto.arima doesn't handle the IO=c(...) stuff Umm... #transfer functions, etc. are a deficency in the script at this point #linearModel[x]arma.io? arma1.io - detectIO(arma1.fit) arma2.io - detectIO(arma2.fit) arma3.io - detectIO(arma3.fit) arma4.io - detectIO(arma4.fit) arma5.io - detectIO(arma5.fit) #Sample on how to auto-grab regressors from DetectAO and DetectIO and #appened them to our regression array. You'd have to do this for each model #as the residuals are where the outliers are coming from and diff models #would have different residuals left over. IO is best left to arimax functions #directly. I assume at this point that AO's can be added to Regression tables #if that is the case then REM out the IO lines and pass the detectIO results #into the arimax(x,y,z,IO=detectIO(blah)) # # Need a better understanding of how to address the AO and IO's in this script before implementing them # (Repeat for each model, cReg1,cReg2,etc..) # #cReg1=cReg #fReg1=fReg #for(i in arma1.io$ind){ print(i);cReg1[,paste(sep= ,IO,i)]=1*(seq(cReg1[,2])==i)} #for(i in arma1.ao$ind){ print(i);cReg1[,paste(sep= ,AO,i)]=1*(seq(cReg1[,2])==i)} #for(i in arma1.io$ind){ print(i);fReg1[,paste(sep= ,IO,i)]=1*(seq(fReg1[,2]))} #for(i in arma1.ao$ind){ print(i);fReg1[,paste(sep= ,AO,i)]=1*(seq(fReg1[,2]))} #Get the pdq,PDQs into a variable so we can re-feed it if neccessary #oh crap absorbing this into LinearModel[x] looks ugly for syntax arma1.fit$order=c(arma1.fit$arma[1],arma1.fit$arma[2],arma1.fit$arma[6]) arma2.fit$order=c(arma2.fit$arma[1],arma2.fit$arma[2],arma2.fit$arma[6]) arma3.fit$order=c(arma3.fit$arma[1],arma3.fit$arma[2],arma3.fit$arma[6]) arma4.fit$order=c(arma4.fit$arma[1],arma4.fit$arma[2],arma4.fit$arma[6]) arma5.fit$order=c(arma5.fit$arma[1],arma5.fit$arma[2],arma5.fit$arma[6]) arma1.fit$seasonal=c(arma1.fit$arma[3],arma1.fit$arma[4],arma1.fit$arma[7]) arma2.fit$seasonal=c(arma2.fit$arma[3],arma2.fit$arma[4],arma2.fit$arma[7]) arma3.fit$seasonal=c(arma3.fit$arma[3],arma3.fit$arma[4],arma3.fit$arma[7]) arma4.fit$seasonal=c(arma4.fit$arma[3],arma4.fit$arma[4],arma4.fit$arma[7]) arma5.fit$seasonal=c(arma5.fit$arma[3],arma5.fit$arma[4],arma5.fit$arma[7]) #these Two are used for linearModel2 and linearModel4, Get only the #regressors that surived step removal. newcReg=cReg[match(names(linearModel2$coeff[-1]),names(cReg))] newfReg=fReg[match(names(linearModel2$coeff[-1]),names(fReg))] newmcReg=mcReg[match(names(linearModel2$coeff[-1]),names(mcReg))] newmfReg=mfReg[match(names(linearModel2$coeff[-1]),names(mfReg))] #Scenario 1 - All Regressors Left In newFit1.b - Arima(modelSource,order=arma1.fit$order,seasonal=list(order=arma1.fit$seasonal),xreg=mcReg,include.drift=F) #Scenario 2 - Step Removal of Regressors newFit2.b - Arima(modelSource,order=arma2.fit$order,seasonal=list(order=arma2.fit$seasonal),xreg=newmcReg,include.drift=F) #Scenario 3 - All Regressors Left In with Intercept Removed newFit3.b - Arima(modelSource,order=arma3.fit$order,seasonal=list(order=arma3.fit$seasonal),xreg=mcReg,include.drift=F) #Scenario 4 - Step Removal of Regressors with Intercept Removed (I have a feeling this is identical to #2 in results newFit4.b - Arima(modelSource,order=arma4.fit$order,seasonal=list(order=arma4.fit$seasonal),xreg=newmcReg,include.drift=F) #Scenario 5 - Robust1, For giggles and grins for now newFit5.b - Arima(modelSource,order=arma5.fit$order,seasonal=list(order=arma5.fit$seasonal),xreg=newmcReg,include.drift=F) #All the drift terms ones are broke as the drift