[R] degree of freedom of interaction effect
Hi There, Suppose two factors, A and B. A has n levels, B has m levels. Why the degree of freedom of interaction effect of A and B ( here I mean A:B not A*B) is (n-1)*(m-1) not n*m-1? I know this is a basic statistical question. Please recommend some good websites or books. Thank you very much. Fan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cex in xlab, ylab and zlab of persp
I've had no success using what I think is the correct code to change the default size of the x, y and z labels in a persp graph (i.e. cex.lab). Is there a particular specification to accomplish this? Many thanks, Joe Retzer [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing all NA's in a dataframe with zeros...
On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote: Since you can index a matrix or dataframe with a matrix of logicals, you can use is.na() to index all the NA locations and replace them all with 0 in one command. A quicker solution, that, IIRC, was posted to the list by Peter Dalgaard several years ago is: sapply(mydata.df, function(x) {x[is.na(x)] - 0; x})) Some timings on a larger problem with 100 columns: mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 1000*100, replace = TRUE), nrow = 1000)) system.time(retval - sapply(mydata.df, function(x) {x[is.na(x)] - 0; x})) [1] 0.108 0.008 0.120 0.000 0.000 system.time(mydata.df[is.na(mydata.df)] - 0) [1] 2.460 0.028 2.498 0.000 0.000 And a larger problem still, 1000 columns mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 1000*1000, replace = TRUE), nrow = 1000)) system.time(retval - sapply(mydata.df, function(x) {x[is.na(x)] - 0; x})) [1] 0.908 0.068 2.657 0.000 0.000 system.time(mydata.df[is.na(mydata.df)] - 0) [1] 43.127 0.332 46.440 0.000 0.000 Profiling mydata.df[is.na(mydata.df)] - 0 shows that it spends most of this time subsetting the the individual cells of the data frame in turn and setting the NA ones to 0. HTH G mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 30, replace = TRUE), nrow = 6)) mydata.df V1 V2 V3 V4 V5 1 1 NA 1 1 1 2 1 NA NA NA 1 3 NA NA 1 NA NA 4 NA NA NA NA 1 5 NA 1 NA NA 1 6 1 NA NA 1 1 is.na(mydata.df) V1V2V3V4V5 1 FALSE TRUE FALSE FALSE FALSE 2 FALSE TRUE TRUE TRUE FALSE 3 TRUE TRUE FALSE TRUE TRUE 4 TRUE TRUE TRUE TRUE FALSE 5 TRUE FALSE TRUE TRUE FALSE 6 FALSE TRUE TRUE FALSE FALSE mydata.df[is.na(mydata.df)] - 0 mydata.df V1 V2 V3 V4 V5 1 1 0 1 1 1 2 1 0 0 0 1 3 0 0 1 0 0 4 0 0 0 0 1 5 0 1 0 0 1 6 1 0 0 1 1 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: [EMAIL PROTECTED] tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of David L. Van Brunt, Ph.D. Sent: Wed 3/14/2007 5:22 PM To: R-Help List Subject: [R] replacing all NA's in a dataframe with zeros... I've seen how to replace the NA's in a single column with a data frame * mydata$ncigs[is.na(mydata$ncigs)]-0 *But this is just one column... I have thousands of columns (!) that I need to do this, and I can't figure out a way, outside of the dreaded loop, do replace all NA's in an entire data frame (all vars) without naming each var separately. Yikes. I'm racking my brain on this, seems like I must be staring at the obvious, but it eludes me. Searches have come up CLOSE, but not quite what I need.. Any pointers? -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC [f] +44 (0)20 7679 0565 UCL Department of Geography Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street London, UK[w] http://www.ucl.ac.uk/~ucfagls/ WC1E 6BT [w] http://www.freshwaters.org.uk/ %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing all NA's in a dataframe with zeros...
Gavin Simpson wrote: On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote: Since you can index a matrix or dataframe with a matrix of logicals, you can use is.na() to index all the NA locations and replace them all with 0 in one command. A quicker solution, that, IIRC, was posted to the list by Peter Dalgaard several years ago is: sapply(mydata.df, function(x) {x[is.na(x)] - 0; x})) I hope your memory fails you, because it doesn't actually work. sapply(test.df, function(x) {x[is.na(x)] - 0; x}) x1 x2 x3 [1,] 0 1 1 [2,] 2 2 0 [3,] 3 3 0 [4,] 0 4 4 is a matrix, not a data frame. Instead: test.df[] - lapply(test.df, function(x) {x[is.na(x)] - 0; x}) test.df x1 x2 x3 1 0 1 1 2 2 2 0 3 3 3 0 4 0 4 4 Speedwise, sapply() is doing lapply() internally, and the assignment overhead should be small, so I'd expect similar timings. -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Connecting R-help and Google Groups?
On Wed, Mar 14, 2007 at 06:30:53PM -0500, Ranjan Maitra wrote: I agree with Bert on this one! Any commercial entity's future policies will not be decided by some group's past understanding. Everything can be explained in terms of shareholder value. Personally, i would go even further. I think it is very dangeorous to allow google and other search engines to search the R web pages and mailing lists, who knows what they'll do with this information some day, or maybe they're already doing it right now! All search engines should be banned from these sites IMHO. Personally i wouldn't even allow to download these pages with Internet Explorer, Safari, Netscape, Firefox or any commercial web browser. Also, the R team shouldn't support R on any commercial platforms like MS Windows, OSX, Redhat Debian Linux, etc. There is a considerable chance that the policies of these companies will affect the R community negatively. I don't see any advantages with tying up to Google groups. We get enough posts every day here to keep us all busy with even a fraction of them. I also think people should be encouraged to follow the policies such as read the basics An Intro etc, before running off and posting. Besides, and most importantly, I prefer having statisticians or those interested in statistics applied to their problems discuss their issues and software, and I learn a lot in this mailing list even in lurk mode. I could do without random posters. I completely agree. The R user community is big enough right now, and it is clear that we want no more newbie's with annoying questions. Nor want we more statistics professors, it is our right and duty to answer all the questions about R! Btw, anyone using R should be encouraged to use RSiteSearch to search this mailing list on some topic. Yes. I don't even understand why it is possible to search the mailing list via a web form. It would be a great user filter to stop this service! Only advanced users who are able to install R and find RSiteSearch should be allowed to see the posts. As for posting i recommend to set up a test with questions about statistics, programming and other relevant topics and only users passing the test would be allowed to submit posts to the lists. It would be also wise to make them repeat the test every year, just in case they forget something. Best, Gabor Best, Ranjan On Wed, 14 Mar 2007 13:36:33 -0400 Paul Lynch [EMAIL PROTECTED] wrote: Well, I don't see what danger could arise from the fact that Google Groups is owned by a company. Google Groups provides access to all of usenet, plus many mailing lists (e.g. the ruby-talk mailing list for Ruby programmers). They don't control any of the newgroups or mailing lists that they provide access to. It is a free service, supported by advertising. As for the issue of whether there might be future access problems (e.g. if Google goes bankrupt, which currently seems unlikely) R users would still have access to the r-help list through the means that they have now. I am not recommending replacing any of the current means of access to the r-help list; I am just asking about adding an additional means of access. --Paul On 3/14/07, Bert Gunter [EMAIL PROTECTED] wrote: I know nothing about Google Groups, but FWIW, I think it would be most unwise for R/CRAN to hook up to **any** commercially sponsored web portals. Future changes in their policies, interfaces,or access conditions may make them inaccessible or unfreindly to R users. So long as we have folks willing and able to host and maintain our lists as part of the CRAN infrastructure, CRAN maintains control. I think this is wise and prudent. I am happy to be educated to the contrary if I misunderstand how this would work. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 650-467-7374 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Lynch Sent: Wednesday, March 14, 2007 8:48 AM To: R-help@stat.math.ethz.ch Subject: [R] Connecting R-help and Google Groups? This morning I tried to see if I could find the r-help mailing list on Google Groups, which has an interface that I like. I found three Google Groups (The R Project for Statistical Computing, rproject, and rhelp) but none of them are connected to the r-help list. Is there perhaps some reason why it wouldn't be a good thing for there to be a connected Google Group? I think it should be possible to set things up so that a post to the Google Group goes to the r-help mailing list, and vice-versa. Also, does anyone know why the three existing R Google Groups failed to get connected to r-help? It might require some action on the part of the r-help list administrator. Thanks, --Paul -- Paul Lynch Aquilent,
Re: [R] fitting of all possible models
Dear Lukas, allthough I'm intrigued by the purpose of what you are trying to do, as mentioned by some of the other persons on this list, I liked the challenge to write such a function. I came up with the following during some train-traveling this morning: tum - function(x) { tum - matrix(data=NA, nrow=2^x, ncol=x) for (i in 1:x) { tum[,i] - c(rep(NA,2^i/2),rep(i+1,2^i/2)) } return(tum) } ### all.models - function(model) { npred - length(model$coefficients) - 1 matr.model - tum(npred) output - matrix(data=NA, nrow=2^npred, ncol=1) for (t in 2:2^npred) { preds - names(model$coefficients) interc - names(model$coefficients)[1] form - as.formula(paste(. ~, paste(preds[na.omit(matr.model [t,])],collapse=+))) model2 - update(model, form) output[t,] - mean(resid(model2)^2) } return(output) } ## As you can see, I used a helper-function (tum, the ultimate matrix) to the actual function. Also, I wrote it using lm instead of glm, but I suppose that you can easily alter that. As well, the function now returns just some regular fit-measurement. But that is not all that essential, I think. The main point is: it works! Using this on my G4 mac, with a lm of 10 predictors and 18 cases, it returns the output quite fast (1 minute). I hope you can put this to use. It needs some easy adapting to your specific needs, but I don't expect that to be a problem. If you need help with that, please contact me. I'd appreciate to hear from you, if this function is helpful in any way. sincerely, Rense Nieuwenhuis On Feb 27, 2007, at 8:46 , Indermaur Lukas wrote: Hi, Fitting all possible models (GLM) with 10 predictors will result in loads of (2^10 - 1) models. I want to do that in order to get the importance of variables (having an unbalanced variable design) by summing the up the AIC-weights of models including the same variable, for every variable separately. It's time consuming and annoying to define all possible models by hand. Is there a command, or easy solution to let R define the set of all possible models itself? I defined models in the following way to process them with a batch job: # e.g. model 1 preference- formula(Y~Lwd + N + Sex + YY) # e.g. model 2 preference_heterogeneity- formula(Y~Ri + Lwd + N + Sex + YY) etc. etc. I appreciate any hint Cheers Lukas °°° Lukas Indermaur, PhD student eawag / Swiss Federal Institute of Aquatic Science and Technology ECO - Department of Aquatic Ecology Überlandstrasse 133 CH-8600 Dübendorf Switzerland Phone: +41 (0) 71 220 38 25 Fax: +41 (0) 44 823 53 15 Email: [EMAIL PROTECTED] www.lukasindermaur.ch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cex in xlab, ylab and zlab of persp
Internally, labs in persp are drawn as when you use text function. So you cannot change the sizes by cex.lab, but you can change by cex: persp(x, y, z, cex=1.5) gives larger labs in persp 3d plot. Of course there may be some side effect because it may change the size something other than labs. HTH On 3/15/07, Joseph Retzer [EMAIL PROTECTED] wrote: I've had no success using what I think is the correct code to change the default size of the x, y and z labels in a persp graph (i.e. cex.lab). Is there a particular specification to accomplish this? Many thanks, Joe Retzer [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing all NA's in a dataframe with zeros...
On Thu, Mar 15, 2007 at 10:21:22AM +0100, Peter Dalgaard wrote: Gavin Simpson wrote: On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote: Since you can index a matrix or dataframe with a matrix of logicals, you can use is.na() to index all the NA locations and replace them all with 0 in one command. A quicker solution, that, IIRC, was posted to the list by Peter Dalgaard several years ago is: sapply(mydata.df, function(x) {x[is.na(x)] - 0; x})) I hope your memory fails you, because it doesn't actually work. sapply(test.df, function(x) {x[is.na(x)] - 0; x}) x1 x2 x3 [1,] 0 1 1 [2,] 2 2 0 [3,] 3 3 0 [4,] 0 4 4 is a matrix, not a data frame. Instead: test.df[] - lapply(test.df, function(x) {x[is.na(x)] - 0; x}) test.df x1 x2 x3 1 0 1 1 2 2 2 0 3 3 3 0 4 0 4 4 Speedwise, sapply() is doing lapply() internally, and the assignment overhead should be small, so I'd expect similar timings. just an idea: given the order of magnitude difference (factor 17 or so) in runtime between the obvious solution and the fast one: would'nt it be possible/sensible to modify the corresponding subsetting method ([.data.frame) such that it recognizes the case when it is called with an arbitrary index matrix (the problem is not restricted to indexing with a logical matrix, I presume?) and switch internally to the fast solution given above? in my (admittedly limited) experience it seems that one of the not so nice properties of R is that one encounters in quite a few situations exactly the above situation: unexpected massive differences in run time between different solutions (I'm not talking about explicit loop penalty). what concerns me most, are the very basic scenarios (not complex algorithms): data frames vs. matrices, naming vector components or not, subsetting, read.table vs. scan, etc. if their were a concise HOW TO list for the cases when speed matters, that would be helpful, too. I understand that part of the uneven performance is unavoidable and one must expect the user to go to the trouble to understand the reasons, e.g. for differences between handling purely numerical data in either matrices or data frames. but a factor of 17 between the obvious approach and the wise one seems a trap in which 99% of the people will step (probably never thinking that their might be a faster approach). joerg __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing all NA's in a dataframe with zeros...
On Thu, 2007-03-15 at 10:21 +0100, Peter Dalgaard wrote: Gavin Simpson wrote: On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote: Since you can index a matrix or dataframe with a matrix of logicals, you can use is.na() to index all the NA locations and replace them all with 0 in one command. A quicker solution, that, IIRC, was posted to the list by Peter Dalgaard several years ago is: sapply(mydata.df, function(x) {x[is.na(x)] - 0; x})) I hope your memory fails you, because it doesn't actually work. Ah, yes, apologies Peter. I have the sapply version embedded in a package function that I happened to be working on (where I wanted the result to be a matrix) and pasted directly from there and not my crib sheet of useful R-help snippets where I do have it as lapply(...). I'd forgotten I'd changed Peter's suggestion slightly in my function. That'll teach me to reply before my morning cup of Earl Grey. All the best, G sapply(test.df, function(x) {x[is.na(x)] - 0; x}) x1 x2 x3 [1,] 0 1 1 [2,] 2 2 0 [3,] 3 3 0 [4,] 0 4 4 is a matrix, not a data frame. Instead: test.df[] - lapply(test.df, function(x) {x[is.na(x)] - 0; x}) test.df x1 x2 x3 1 0 1 1 2 2 2 0 3 3 3 0 4 0 4 4 Speedwise, sapply() is doing lapply() internally, and the assignment overhead should be small, so I'd expect similar timings. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading raw matrix saved with writeBin
FYI, to save data as bitmap images, see the EBImage package on Bioconductor. /H On 3/15/07, Ranjan Maitra [EMAIL PROTECTED] wrote: On Wed, 14 Mar 2007 18:45:53 -0700 (PDT) Milton Cezar Ribeiro [EMAIL PROTECTED] wrote: Dear Friends, I saved a matrix - which contans values 0 and 1 - using following command: writeBin (as.integer(mymatrix), myfile.raw, size=1). It is working fine and I can see the matrix using photoshop. But now I need read the matrices again (in fact I have a thousand of them) as matrix into R but when I try something like mat.dat-readBin (myfile.raw,size=1) I can´t access the matrix Kind regards, Miltinho __ [[alternative HTML version deleted]] Look up the help file. There is an explicit example. Basically, you need to tell the file to read in binary. In fact, I am a little surprised your first command works while writing. HTH! Ranjan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regarding cointegration
Hi All Thanks for supporting people like me. What is cointegration and its connection with granger causality test ? what is its use and mathematical methodology behind it. Secondly, is cointegration test like Phillips-Ouliaris Cointegration Test of tseries package or of urca package is the same as cointegration ? Please tell me how to go about it and interpret the results ? Thanks in advance cheers :-) -gaurav DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What happened to the rpm package ?
Hi List, sorry if this is a FAQ : I could not make my way to it :-( Once upon a time it happened to be a package named rpm for Robust Point Matching. I can find a few traces of it in the CRAN archives : works by Saussen and al. (Aligning spectra with R), but can't find the package anymore. No trace at all in the Old or Orphaned lists Does one UseR know about it, or even use it ? Thanks in advance, and regards to all. Olivier -- Olivier ETERRADOSSI Maître-Assistant CMGD / Equipe Propriétés Psycho-Sensorielles des Matériaux Ecole des Mines d'Alès Hélioparc, 2 av. P. Angot, F-64053 PAU CEDEX 9 tel std: +33 (0)5.59.30.54.25 tel direct: +33 (0)5.59.30.90.35 fax: +33 (0)5.59.30.63.68 http://www.ema.fr __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating q and p functions from a self-defined distribution
Hello all, I am fishing for some suggestions on efficient ways to make qdist and pdist type functions from an arbitrary distribution whose probability density function I've defined myself. For example, let's say I have a distribution whose pdf is: dRN - function(x,d,v,s) # d, v, and s are parameters return(d/x^2/sv/sqrt(2*pi)*exp(-(d-v*x)^2/2/(sv^2*x^2))) this is a legitimate distribution over the reals (though it has a singularity at x=0) at the moment, to get a p value, I sum the dRN over some arbitrary interval dt: pRN-function(x,d,v,s,dt=.001) { xs-seq(0,x,dt) dRN-dRN(xs,d,v,s) return(sum(dRN)*dt) } which is OK, except I sometimes want a vector of p values for a vector of x's, say: pRN(xs=seq(0,10,.1) ... ) which involves an unwieldly loop. Getting a quantile value is a real nightmare! Right now I have a thing that involves using approx() and matching rounded numbers and requires a loop and is slow. It seems surprising that it would be so hard to invert a function that is perfectly defined! Are there packages/functions/algorithms that allow one to manipulate arbitrarily defined distributions? Any suggestions appreciated, Thanks, Eli * Eli Gurarie Quantitative Ecology and Resource Management University of Washington, Seattle __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] expm() within the Matrix package
Hi Could anybody give me a bit of advice on some code I'm having trouble with? I've been trying to calculate the loglikelihood of a function iterated over a data of time values and I seem to be experiencing difficulty when I use the function expm(). Here's an example of what I am trying to do y-c(5,10) #vector of 2 survival times p-Matrix(c(1,0),1,2) #1x2 matrix Q-Matrix(c(1,2,3,4),2,2) #2x2 matrix q-Matrix(c(1,2),2,1) #2x1 matrix Loglik-rep(0,length(y))#creating empty vector of same length as y for(i in 1:length(y)){ Loglik[i]-log((p %*% (expm(Q*y[i]))) %*% q) #calculating # Loglikelihood for each y[i] } The code works perfectly if I use exp(Q*y[i]) but not for expm() If anyone has any advice they could give that would be great. I would like to thank Gad Abraham also for helping me get this far. Thanks in advance Laura __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Occasional problems with starting batch mode
I use a windows Server 2003 machine to run R code in batch mode every night using the following command: F:\Program Files\R\R-2.4.1pat\bin\R.exe CMD BATCH --vanilla --slave batch_master_dr.R This produces an output file, of which the first three lines look like this: Loading required package: tcltk Loading Tcl/Tk interface ... done Loading required package: R2HTML 99% of the time this works great. Every once in a while, I get the following error message instead and it does not run any of my subsequent code: Loading required package: tcltk Loading Tcl/Tk interface ... Unable to register TkTopLevel class This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. Does anyone have any clue where I should look to fix this problem? Thanks, Roger version.string R version 2.4.1 Patched (2007-02-04 r40647) ** * This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by any error in transmission. If you have received this message in error, please immediately notify the sender by e-mail, delete the message and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message if you are not the intended recipient. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regarding cointegration
Please do not mess up the thread by posting as a reply to some other topic. Thanks, Ranjan On Thu, 15 Mar 2007 16:51:20 +0530 [EMAIL PROTECTED] wrote: Hi All Thanks for supporting people like me. What is cointegration and its connection with granger causality test ? what is its use and mathematical methodology behind it. Secondly, is cointegration test like Phillips-Ouliaris Cointegration Test of tseries package or of urca package is the same as cointegration ? Please tell me how to go about it and interpret the results ? Thanks in advance cheers :-) -gaurav DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mauchly.test
Bonjour, elles correspondent à des données de différents traitements (EU) avec des dates différentes j'aimerai pouvoir faire une comparaison de k moyennes selon ces deux variables date et EU ANOVA à n facteurs n'est pas la bonne solution je voudrai savoir qu'elles sont les tests qui permettent de résoudre ce problème (échantillons appariés) je suis partis sur le test de Mauchley sans réussite pouvez vous m'aider merci Sébastien Hello, i would like to compare the k means of DW with two parameters (date and traitement) but i don't use ANOVA with n factors (aov) because i have a link between wk2 and wk1, wk3 and wk1-wk2 ... (matched samples). if you know the test that i can use (Mauchly.test) thanks. Sébastien Voici mes données: dateEU DW wk1 EU1 5.324547829 wk1 EU1 7.321253265 wk1 EU1 4.431712065 wk1 EU2 8.230322407 wk1 EU2 8.546873269 wk1 EU2 5.657332069 wk1 EU3 3.165508618 wk1 EU3 4.431712065 wk1 EU3 1.899305171 wk2 EU1 2.163097556 wk2 EU1 17.61379438 wk2 EU1 15.82754309 wk2 EU2 16.46064481 wk2 EU2 19.30960257 wk2 EU2 13.92823792 wk2 EU3 6.014466374 wk2 EU3 7.280669822 wk2 EU3 5.064813789 wk4 EU1 11.03179753 wk4 EU1 29.75578101 wk4 EU1 22.71252433 wk4 EU2 27.85647584 wk4 EU2 36.71989997 wk4 EU2 20.11680727 wk4 EU3 13.59661321 wk4 EU3 13.2951362 wk4 EU3 14.56133964 wk6 EU1 30.73875474 wk6 EU1 33.27842393 wk6 EU1 35.27512937 wk6 EU2 31.2817185 wk6 EU2 41.53147307 wk6 EU2 35.57093758 wk6 EU3 8.652390223 wk6 EU3 18.6359174 wk6 EU3 15.30807501 _ Windows Live Spaces : créez votre blog à votre image ! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating q and p functions from a self-defined distribut
On 15-Mar-07 12:09:42, Eli Gurarie wrote: Hello all, I am fishing for some suggestions on efficient ways to make qdist and pdist type functions from an arbitrary distribution whose probability density function I've defined myself. For example, let's say I have a distribution whose pdf is: dRN - function(x,d,v,s) # d, v, and s are parameters return(d/x^2/sv/sqrt(2*pi)*exp(-(d-v*x)^2/2/(sv^2*x^2))) this is a legitimate distribution over the reals (though it has a singularity at x=0) [...] It seems surprising that it would be so hard to invert a function that is perfectly defined! Are there packages/functions/algorithms that allow one to manipulate arbitrarily defined distributions? Do not be surprised! The Normal distribution function itself, with pdf (1/(sv*sqrt(1*pi)))*exp(-((x - mu)^2)/(2*sv^2)), is perfectly well defined. Yet the literature of computation over decades has presented procedure after procedure for computing the cumulative fucntion, and its inverse, to desired precision. None of these is simple. Indeed, (to use your own word), the Normal distribution is a nightmare from this point of view. So being well-defined is no guarantee that computing its p and q values will be a simple or easy problem. And what kind of method is good for a particular distribution will depend strongly on what the distribution is (for instance, whether it has tails which tend rapidly to 0 like the Normal, whether there are good asymptotic formulae, etc.). In the case of the example you give above, the transformation from x to u = 1/x will translate it into a Normal distribution problem, after which you can use (circumspectly ... ) the R functions pnorm and qnorm (which are based on the best from the above literature) to deal with it. But you give it only as an example ... and you are asking for methods for a general user-defined distribution. For the reasons given above, a good general-purpose method is unlikely to exist. With best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 15-Mar-07 Time: 13:26:27 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hardware for a new Workstation for best performance using R
Hi, we are looking for a new workstation for big datasets/applications (easily up to 100'000 records and up to 300 Variables) using R. As an example: Variable Selection for a multivariate regression using stepAIC. What is the best configuration for a workstation to reach a high performance level for computing with R? Single core or multi core (is R together with nws package really able to use advantage of multi core processors, any experience/benchmarks on that)? Shall we use Linux instead of Windows? If yes, how is the compatibility of graphics computed on Linux if we like to use them after on windows? And what are the advantages using Linux instead of Windows? What kind of workstations are you using (hardware and operating system) for big data computations? And are you satisfied with it? I'm quite familiar with pc or server hardware. Thanks in advance Sascha Morach __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in upgrade
Dear All, update.packages(ask='graphics') --- Please select a CRAN mirror for use in this session --- Error in .readRDS(pfile) : unknown input format. ??? TIA Giovanni -- dr. Giovanni Parrinello Department of Biotecnologies Medical Statistics Unit University of Brescia Viale Europa, 11 25123 Brescia email: [EMAIL PROTECTED] Phone: +390303717528 Fax: +390303717488 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hardware for a new Workstation for best performance using R
I can speak to some of these issues. I don't know about how much benefit you can get from SMP for *single* instances of R, though. 1.) Multicore will be helpful, at least, if you are running several instances of R at once. So, for example, if you have people running two different models at the same time, the OS can use separate processors or cores for each instance. 2.) Yes, by all means you should use linux instead of windows. The graphics output is completely compatible with whatever applications you want to paste them into on Windows. Linux is cheaper, stabler, and better at using the system's resources. 3.) If you're doing big datasets, you certainly need a 64-bit processor, operating system, and R. Consider, perhaps, a dual-Athlon XP 64 machine with a big pile of RAM? Andy -- Andrew J Perrin - andrew_perrin (at) unc.edu - http://perrin.socsci.unc.edu Assistant Professor of Sociology; Book Review Editor, _Social Forces_ University of North Carolina - CB#3210, Chapel Hill, NC 27599-3210 USA New Book: http://www.press.uchicago.edu/cgi-bin/hfs.cgi/00/178592.ctl On Thu, 15 Mar 2007, Sascha Morach wrote: Hi, we are looking for a new workstation for big datasets/applications (easily up to 100'000 records and up to 300 Variables) using R. As an example: Variable Selection for a multivariate regression using stepAIC. What is the best configuration for a workstation to reach a high performance level for computing with R? Single core or multi core (is R together with nws package really able to use advantage of multi core processors, any experience/benchmarks on that)? Shall we use Linux instead of Windows? If yes, how is the compatibility of graphics computed on Linux if we like to use them after on windows? And what are the advantages using Linux instead of Windows? What kind of workstations are you using (hardware and operating system) for big data computations? And are you satisfied with it? I'm quite familiar with pc or server hardware. Thanks in advance Sascha Morach __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cex in xlab, ylab and zlab of persp
I had similar problems, actually it is very difficult to customize persp graphics, you should try wireframe in lattice instead. This reference might help to customize the wireframe plots: http://www.polisci.ohio-state.edu/faculty/lkeele/3dinR.pdf JR El mié, 14-03-2007 a las 20:40 -0700, Joseph Retzer escribió: I've had no success using what I think is the correct code to change the default size of the x, y and z labels in a persp graph (i.e. cex.lab). Is there a particular specification to accomplish this? Many thanks, Joe Retzer [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in upgrade
Giovanni Parrinello said the following on 3/15/2007 6:43 AM: Dear All, update.packages(ask='graphics') --- Please select a CRAN mirror for use in this session --- Error in .readRDS(pfile) : unknown input format. ??? TIA Giovanni I cannot replicate this in R-2.4.1. What version of R and repository are you using? update.packages(ask = graphics, repos = http://cran.cnr.berkeley.edu;) --sundar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Redirecting output to the screen
This is much easier in R-devel: just use message() and scan(stdin). gannet% cat Test.R message(Enter file name: , appendLF=FALSE) fn - scan(stdin, what=, n=1) works for me in R-devel via R --vanilla -f Test.R Rout.txt I believe it also works under Windows. On Wed, 14 Mar 2007, John Schuenemeyer wrote: A simple example follows. The file is called Test.R # Example rm(list=is(all=TRUE)) cat(Enter file name) fn-scan(what=) I execute the following: @C:\PROGRA~1\R\R-2.4.1\bin\Rterm.exe --no-restore --no-save Test.R Rout.txt I do not see the Enter file name or have the opportunity to enter the file name. I am running R in windows XP. Thanks for your help. John H Schuenemeyer Southwest Statistical Consulting, LLC 960 Sligo St Cortez, CO 81321-2558 Phone: 970.565.0179 URL: www.swstatconsult.com [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. PLEASE do, and not send HTML. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to...
I have to perform ANOVA's on many different data organized in a dataframe. I can run an ANOVA for each sample, but I've got hundreds of data and I would like to avoid manually carrying out each test. in addition, I would like to have the results organized in a simple way, for example in a table, wich could be easy to export. thank you for assistance simone -- Leggi GRATIS le tue mail con il telefonino i-mode di Wind http://i-mode.wind.it __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Seemingly Unrelated Regressions question - not R question
Does anyone know where I can find a proof of the fact that when each X matrix in a SUR is the same, then SUR estimation is equivalent to OLS estmation. The proof is supposedly in William Greene's book but that book costs 157.00 an has mixed reviews so I am reluctant to purchase it. Thanks. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating q and p functions from a self-defined distribut
[EMAIL PROTECTED] 15/03/2007 13:26:52 On 15-Mar-07 12:09:42, Eli Gurarie wrote: Hello all, I am fishing for some suggestions on efficient ways to make qdist and pdist type functions from an arbitrary distribution whose probability density function I've defined myself. Ted Harding accurately said there is unliekly to be an efficient general answer. However, if you want a dreadfully inefficient but very general answer, could you use a numerical integral to calculate the cumulative probabilities, and a root-finder to find quantiles from the integral function? (strictly, the quantile would be the root of {cumulative integral} - p where p is where you want the quantile. uniroot is the general-purpose root-finder in R; it isn't ideal for this purpose because it won't like an interval of +-In, which you will certainly need for general quantiles. But you should be able to make it work with minimal tinkering if you use it on logit(p) and apply it to the interval [0,1]Converting back from logit after uniroot should then give you your quantile. You may also want to give it a smaller tolerance; the default looks like its geared to a sum-of-squares minimiser, and may be a bit over-generous for your purpose.. Steve Ellison. *** This email and any attachments are confidential. Any use, co...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cannot allocate vector size of... ?
Hello all, I've been working with R Fridolin Wild's lsa package a bit over the past few months, but I'm still pretty much a novice. I have a lot of files that I want to use to create a semantic space. When I begin to run the initial textmatrix( ), it runs for about 3-4 hours and eventually gives me an error. It's always ERROR: cannot allocate vector size of xxx Kb. I imagine this might be my computer running out of memory, but I'm sure. So I thought I would send this to community at large for any help/thoughts. I search the archives and didn't really find anything that specifically speaks to my situation. So I guess I have s few questions. First, is this actually an issue with the machine running out of memory? If not, what might be the cause for the error? If so, is there a way to minimize the amount of memory used by the vector data structures (e.g., Berkeley DB)? Thanks, Gabe Wingfield IT and Program Specialist I Center for Applied Social Research University of Oklahoma 2 Partners Place 3100 Monitor, Suite 100 Norman, OK 73072 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Density estimation graphs
Mark Wardle wrote: Dear all, I'm struggling with a plot and would value any help! ... Is there a better way? As always, I'm sure there's a one-liner rather than my crude technique! As always, I've spent ages trying to sort this, and then the minute after sending an email, I find the polygon() function. Ignore previous message! Best wishes, Mark __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Density estimation graphs
Dear all, I'm struggling with a plot and would value any help! I'm attempting to highlight a histogram and density plot to show a proportion of cases above a threshold value. I wanted to cross-hatch the area below the density curve. The breaks and bandwidth are deliberate integer values because of the type of data I'm looking at. I've managed to do this, but I don't think it is very good! It would be difficult, for example, to do a cross-hatch using this technique. allele.plot - function(x, threshold=NULL, hatch.col='black', hatch.border=hatch.col, lwd=par('lwd'),...) { h - hist(x, breaks=max(x), plot=F) d - density(x, bw=1) plot(d, lwd=lwd, ...) if (!is.null(threshold)) { d.t - d$xthreshold d.x - d$x[d.t] d.y - d$y[d.t] d.l - length(d.x) # draw all but first line of hatch for (i in 2:d.l) { lines(c(d.x[i],d.x[i]),c(0,d.y[i]), col=hatch.col,lwd=1) } # draw first line in hatch border colour lines(c(d.x[1],d.x[1]),c(0,d.y[1]), col=hatch.border,lwd=lwd) # and now re-draw density plot lines lines(d, lwd=lwd) } } # some pretend data s8 = rnorm(100, 15, 5) threshold = 19 # an arbitrary cut-off allele.plot(s8, threshold, hatch.col='grey',hatch.border='black') Is there a better way? As always, I'm sure there's a one-liner rather than my crude technique! Best wishes, Mark -- Dr. Mark Wardle Clinical research fellow and specialist registrar, Neurology University Hospital Wales and Cardiff University, UK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cannot allocate vector size of... ?
Yes, your error is due to running out of memory. This is probably one of the most frequent questions asked here, so if you search again you can find a lot of advice on how to get around it. As you learn more about R programming you will learn how to store data more efficiently, rm() to remove variables you no longer need, gc() to garbage collect and free up memory. Try to open only the files you need, do some of the analysis, then get rid of everything you don't need, then do some more analysis. Thanks, Roger -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Wingfield, Jerad G. Sent: Thursday, March 15, 2007 12:27 PM To: r-help@stat.math.ethz.ch Subject: [R] Cannot allocate vector size of... ? Hello all, I've been working with R Fridolin Wild's lsa package a bit over the past few months, but I'm still pretty much a novice. I have a lot of files that I want to use to create a semantic space. When I begin to run the initial textmatrix( ), it runs for about 3-4 hours and eventually gives me an error. It's always ERROR: cannot allocate vector size of xxx Kb. I imagine this might be my computer running out of memory, but I'm sure. So I thought I would send this to community at large for any help/thoughts. I search the archives and didn't really find anything that specifically speaks to my situation. So I guess I have s few questions. First, is this actually an issue with the machine running out of memory? If not, what might be the cause for the error? If so, is there a way to minimize the amount of memory used by the vector data structures (e.g., Berkeley DB)? Thanks, Gabe Wingfield IT and Program Specialist I Center for Applied Social Research University of Oklahoma 2 Partners Place 3100 Monitor, Suite 100 Norman, OK 73072 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** * This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by any error in transmission. If you have received this message in error, please immediately notify the sender by e-mail, delete the message and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message if you are not the intended recipient. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [e1071] predict.svm bug ?
Hi all, I'm using SVM to classify data (2 classes) and I get strange results : model = svm(x, y, probability = TRUE) pred = predict(model, x, decision.values = TRUE, probability = FALSE) table(pred,y) y pred ctl nuc ctl 82 3 nuc 5 84 pred 1 2 3 4 5 6 7 8 nuc nuc nuc nuc nuc nuc nuc ctl And now, with probabities computation : pred = predict(model, x, decision.values = TRUE, probability = TRUE) table(pred,y) y pred ctl nuc ctl 7 84 nuc 80 3 pred 1 2 3 4 5 6 7 8 ctl ctl ctl ctl ctl ctl ctl nuc ... However, model, x, and y didn't change !! Also, decision.values didn't change : nuc/ctl 10.505289854 20.265975135 30.863270144 40.354181677 50.868119168 60.702989607 70.206018067 8 -0.271452937 - ctl is correct ! Is it a bug ? Could you explain the difference ? Best regards, charles -- Charles Hébert DYnamique et Organisation des GENomes, Laboratoire de Génétique Moléculaire, CNRS UMR 8541 Ecole Normale Supérieure 46, rue d'Ulm 75005 Paris, France [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot.randomForest default mtry values
When using the plot.randomForest method, 3 error series (by number of trees) are plotted. I suspect they are associated with the 3 default values of mtry that are used, for example, in the tuneRF method but I'm not sure. Could someone confirm? Also, is it possible to force different values of mtry to be used when creating the plots? I specified them explicitly in the randomForest statement but it did not seem to have an effect. Many thanks, Joe Retzer [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Density estimation graphs
On Mar 15, 2007, at 12:37 PM, Mark Wardle wrote: Dear all, I'm struggling with a plot and would value any help! I'm attempting to highlight a histogram and density plot to show a proportion of cases above a threshold value. I wanted to cross- hatch the area below the density curve. The breaks and bandwidth are deliberate integer values because of the type of data I'm looking at. I've managed to do this, but I don't think it is very good! It would be difficult, for example, to do a cross-hatch using this technique. Don't know about a cross-hatch, but in general I use polygon for highlighting areas like that: allele.plot - function(x, threshold=NULL, hatch.col='black', hatch.border=hatch.col, lwd=par('lwd'),...) { h - hist(x, breaks=max(x), plot=F) d - density(x, bw=1) plot(d, lwd=lwd, ...) if (!is.null(threshold)) { d.t - d$xthreshold d.x - d$x[d.t] d.y - d$y[d.t] polygon(c(d.x[1],d.x,d.x[1]),c(0,d.y,0), col=hatch.col,lwd=1) } } # some pretend data s8 = rnorm(100, 15, 5) threshold = 19 # an arbitrary cut-off allele.plot(s8, threshold, hatch.col='grey',hatch.border='black') Perhaps this can help a bit. Btw, what was d.l for? Haris Skiadas Department of Mathematics and Computer Science Hanover College __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot.randomForest default mtry values
Joe, I'm guessing you are doing a 2-category problem. The three lines are OOB errors for overall error and each of the two categories. There is only one default value of mtry. You can specify a different mtry when the forest is built (in your call to randomForest()), but it applies to the entire forest. On 3/15/07, Joseph Retzer [EMAIL PROTECTED] wrote: When using the plot.randomForest method, 3 error series (by number of trees) are plotted. I suspect they are associated with the 3 default values of mtry that are used, for example, in the tuneRF method but I'm not sure. Could someone confirm? Also, is it possible to force different values of mtry to be used when creating the plots? I specified them explicitly in the randomForest statement but it did not seem to have an effect. Many thanks, Joe Retzer [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- HTH, Jim Porzak Loyalty Matrix Inc. San Francisco, CA http://www.linkedin.com/in/jimporzak __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to use result of approxfun in a package?
I am working on a project where we start with start with 2 long, equal-length vectors, massage them in various ways, and end up with a function mapping one interval to another. I'll call that function f1. The last step in R is to generate f1 as the value of the approxfun function. I would like to put f1 into a package, but without having the package redo the creation of function f1. The value of approxfun is a function; for example, I have f1 function (v) .C(R_approx, as.double(x), as.double(y), as.integer(n), xout = as.double(v), as.integer(length(v)), as.integer(method), as.double(yleft), as.double(yright), as.double(f), NAOK = TRUE, PACKAGE = base) $xout environment: 0x17719324 I don't really understand how this is stored, and in particular, how I should handle it so as to include the function f1 in a package. I would like the users to be able to load the package and use f1 directly, rather than re-generate it using approxfun. Thanks, Steve --- Steve Buyske (rhymes with nice key) Associate Research Professor Department of Statistics Biostatistics Rutgers University Hill Center, 110 Frelinghuysen Rd Piscataway, NJ 08854 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating Deviance from log-likelihood
Hi, This is a bit R unrelated, I want to, however, use my results from this in a R function. I am trying to figure out a formula for the deviance given a likelihood function. In my derivation I end up with having ln(y-1) twice in my expression for the deviance (see attached pdf). Which makes it impossible to calculate a value for the deviance since y can be = 1. I know this expression can be simplified differently avoiding ln(y-1). Does someone have a suggestion? I would appreciate you help very much! Thank you, Benjamin deviance.pdf Description: Adobe PDF document __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] logistic regression
Dear All, I would like adjust and know the R2 of following presence/absence data: x-1:10 y-c(0,0,0,0,0,0,0,1,1,1) I tryed use clogit (survival package) but it don´t worked. Any idea? miltinho __ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] vars :VARMA, multivariate time series?
I have a multivariate time series and I would like to build a forecasting model with both AR and MA terms, I think that this is possible in R. I have looked at the vars package and it looks like it is possible to estimate MA terms using the Phi and Psi functions but I am not sure how to incorporate the estimated terms into a forecasting model. I have also looked at the dse package, but have not been able to determine how to estimate a VARMA from the documentation, any direction on which packages to use for VARMA would be appreciated. thanks, Spencer [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot.randomForest default mtry values
Joe, here is a piece of junk copied from my blog, showing how to use mtry and hth. library(MASS); library(randomForest); data(Boston); set.seed(2007); # SEARCH FOR BEST VALUE OF MTRY FOR RANDOM FORESTS mtry - tuneRF(Boston[, -14], Boston[, 14], mtryStart = 1, stepFactor = 2, ntreeTry = 500, improve = 0.01); best.m - mtry[mtry[, 2] == min(mtry[, 2]), 1]; # FIT A RF MODEL rf - randomForest(medv~., data = Boston, mtry = best.m, ntree = 1000, importance = TRUE); # EXTRACT VARIABLE IMPORTANCE imp.tmp - importance(rf, type = 1); rf.imp - imp.tmp[order(imp.tmp[, 1], decreasing = TRUE),]; par(mar = c(3, 0, 4, 0)); barplot(rf.imp, col = gray(0:(ncol(Boston) - 1)/(ncol(Boston) - 1)), names.arg = names(rf.imp), yaxt = n, cex.names = 1); title(main = list(Importance Rank of Predictors, font = 4, cex = 1.5)); # PLOT PARTIAL DEPENDENCE OF EACH PREDICTOR par(mfrow = c(3, 5), mar = c(2, 2, 2, 2), pty = s); for (i in 1:(ncol(Boston) - 1)) { partialPlot(rf, Boston, names(Boston)[i], xlab = names(Boston)[i], main = NULL); } On 3/15/07, Joseph Retzer [EMAIL PROTECTED] wrote: When using the plot.randomForest method, 3 error series (by number of trees) are plotted. I suspect they are associated with the 3 default values of mtry that are used, for example, in the tuneRF method but I'm not sure. Could someone confirm? Also, is it possible to force different values of mtry to be used when creating the plots? I specified them explicitly in the randomForest statement but it did not seem to have an effect. Many thanks, Joe Retzer [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cannot allocate vector size of... ?
Oops. Yep, I totally forgot my specs and such. I'm currently running R-2.4.1 on a 64-bit Linux box (Fedora Core 6) with 4GB of RAM. The files are 10-50Kb on average, but this error came about when only working with ~16,000 of them. The final size of the corpus is ~1.7M files. So, obviously, this memory thing is going to be a large issue for me. I'm going through re-searching the help list archives and now it looks like I have S Poetry to read as well. Thanks for all the suggestions. Any others are greatly appreciated as well. Gabe Wingfield IT and Program Specialist I Center for Applied Social Research University of Oklahoma 2 Partners Place 3100 Monitor, Suite 100 Norman, OK 73072 -Original Message- From: Patrick Burns [mailto:[EMAIL PROTECTED] Sent: Thursday, March 15, 2007 12:31 PM To: Wingfield, Jerad G. Subject: Re: [R] Cannot allocate vector size of... ? You can find a few things not to do (things that waste memory) in S Poetry. You don't say how much memory your machine has, nor how big your objects are. However, it is possible that getting more memory for your machine might be the best thing to do. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Wingfield, Jerad G. wrote: Hello all, I've been working with R Fridolin Wild's lsa package a bit over the past few months, but I'm still pretty much a novice. I have a lot of files that I want to use to create a semantic space. When I begin to run the initial textmatrix( ), it runs for about 3-4 hours and eventually gives me an error. It's always ERROR: cannot allocate vector size of xxx Kb. I imagine this might be my computer running out of memory, but I'm sure. So I thought I would send this to community at large for any help/thoughts. I search the archives and didn't really find anything that specifically speaks to my situation. So I guess I have s few questions. First, is this actually an issue with the machine running out of memory? If not, what might be the cause for the error? If so, is there a way to minimize the amount of memory used by the vector data structures (e.g., Berkeley DB)? Thanks, Gabe Wingfield IT and Program Specialist I Center for Applied Social Research University of Oklahoma 2 Partners Place 3100 Monitor, Suite 100 Norman, OK 73072 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression
On 15-Mar-07 17:03:50, Milton Cezar Ribeiro wrote: Dear All, I would like adjust and know the R2 of following presence/absence data: x-1:10 y-c(0,0,0,0,0,0,0,1,1,1) I tryed use clogit (survival package) but it don´t worked. Any idea? miltinho You are trying to fit an equation P[y = 1 ; x] = exp((x-a)/b))/(1 + exp((x-a)/b)) to data x = 1 2 3 4 5 6 7 8 9 10 y = 0 0 0 0 0 0 0 1 1 1 by what amounts to a maximum-likelihood method, i.e. which chooses the parameter values to maximize the probability of the observed values of y (given the values of x). The maximum probability possible is 1, so if you can find parameters which make P[y = 1] = 0 for x = 1, 2, ... , 7 and P[y = 1] for x = 8, 9, 10 then you have done it. This will be approximated as closely as you please for any value of a between 7 and 8, and sufficiently small values of b, since for such parameter values P[y = 1 ; x] - 0 for x a, and - 1 for x a. You therefore have a solution which is both indeterminate (any a such that 7 a 8) and singular (b - 0). So it will defeat standard estimation methods. That is the source of your problem. In a more general context, this is an instance of the linear separation problem in logistic regression (and similar methods, such a probit analysis). Basically, this situation implies that, according to the data, there is a perfect prediction for the results. There is no well-defined way of dealing with it; any approach starts from the proposition this perfect prediction is not a reasonable result in the context of my data, and continues by following up what you think should be meant by not a reasonable result. What this is likely to mean would be on the lines of b should not be that small, which then imposes upon you the need to be more specific about how small b may reasonably be. Then carry on from there (perhaps by fixing the value of b at different reasonable levels, and simply fitting a for each value of b). Hoping this helps ... but I'm wondering how it happens that you have such data ... ?? best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 15-Mar-07 Time: 19:38:51 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to...
Hi I suppose you will not get usefull response for such poorly specified question. For automating procedures on data frames you can either do looping or use lapply or maybe do.call can also provide some functionality. If you elaborate what you did and in what respect it was unsatisfactory maybe you will get better answer. Anyway, before your next post you shall look to posting guide. Regards Petr On 15 Mar 2007 at 17:20, [EMAIL PROTECTED] wrote: Date sent: Thu, 15 Mar 2007 17:20:57 +0100 From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: R Help R-help@stat.math.ethz.ch Subject:[R] how to... I have to perform ANOVA's on many different data organized in a dataframe. I can run an ANOVA for each sample, but I've got hundreds of data and I would like to avoid manually carrying out each test. in addition, I would like to have the results organized in a simple way, for example in a table, wich could be easy to export. thank you for assistance simone -- Leggi GRATIS le tue mail con il telefonino i-mode di Wind http://i-mode.wind.it __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression TRY LOGISTF
If Ted is right, then one work-around is to use Firth's method for penalized log-likelihood. The technique is originally intended to reduce small sample bias. However, it's now being extended to deal with complete and quasi separation problems. I believe the library is called logistf but I haven't had a chance to try itI know the SAS version (called the fl macro) works fine. Reference -- http://www.meduniwien.ac.at/user/georg.heinze/techreps/tr2_2004.pdf Hope this helps, Jeff Miller University of Florida AlphaPoint05, Inc. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding Sent: Thursday, March 15, 2007 2:39 PM To: R-help Subject: Re: [R] logistic regression On 15-Mar-07 17:03:50, Milton Cezar Ribeiro wrote: Dear All, I would like adjust and know the R2 of following presence/absence data: x-1:10 y-c(0,0,0,0,0,0,0,1,1,1) I tryed use clogit (survival package) but it don´t worked. Any idea? miltinho You are trying to fit an equation P[y = 1 ; x] = exp((x-a)/b))/(1 + exp((x-a)/b)) to data x = 1 2 3 4 5 6 7 8 9 10 y = 0 0 0 0 0 0 0 1 1 1 by what amounts to a maximum-likelihood method, i.e. which chooses the parameter values to maximize the probability of the observed values of y (given the values of x). The maximum probability possible is 1, so if you can find parameters which make P[y = 1] = 0 for x = 1, 2, ... , 7 and P[y = 1] for x = 8, 9, 10 then you have done it. This will be approximated as closely as you please for any value of a between 7 and 8, and sufficiently small values of b, since for such parameter values P[y = 1 ; x] - 0 for x a, and - 1 for x a. You therefore have a solution which is both indeterminate (any a such that 7 a 8) and singular (b - 0). So it will defeat standard estimation methods. That is the source of your problem. In a more general context, this is an instance of the linear separation problem in logistic regression (and similar methods, such a probit analysis). Basically, this situation implies that, according to the data, there is a perfect prediction for the results. There is no well-defined way of dealing with it; any approach starts from the proposition this perfect prediction is not a reasonable result in the context of my data, and continues by following up what you think should be meant by not a reasonable result. What this is likely to mean would be on the lines of b should not be that small, which then imposes upon you the need to be more specific about how small b may reasonably be. Then carry on from there (perhaps by fixing the value of b at different reasonable levels, and simply fitting a for each value of b). Hoping this helps ... but I'm wondering how it happens that you have such data ... ?? best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 15-Mar-07 Time: 19:38:51 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unbiased sandwich variance estimator for GEE estimates
Hi, Anyone knows any existing package/program that has implemented unbiased (or bias-reduced) sandwich variance estimator (Wu (1986) and Carroll (2001) for GEE estimates? Thanks Qiong __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Model selection in LASSO (cross-validation)
Hi, I know how to use LASSO for model selection based on the Cp criterion. I heard that we can also use cross validation as a criterion too. I used cv.lars to give me the lowest predicted error fraction. But I'm short of a step to arrive at the number of variables to be included in the final model. How do we do that? Is it the predict.lars function? i tried logprostate.plars.cv=predict.lars(logprostate.lars.cv, M, type = fit, mode=fraction) but it gives me error message: Error in dim(data) - dim : attempt to set an attribute on NULL. Please help! thanks! _ With tax season right around the corner, make sure to follow these few simple tips. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Model selection in LASSO (cross-validation)
Hi, I know how to use LASSO for model selection based on the Cp criterion. I heard that we can also use cross validation as a criterion too. I used cv.lars to give me the lowest predicted error fraction. But I'm short of a step to arrive at the number of variables to be included in the final model. How do we do that? Is it the predict.lars function? i tried logprostate.plars.cv=predict.lars(logprostate.lars.cv, M, type = fit, mode=fraction) but it gives me error message: Error in dim(data) - dim : attempt to set an attribute on NULL. Please help! Thanks! _ The average US Credit Score is 675. The cost to see yours: $0 by Experian. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Covariance matrix calc method question
I have been comparing the output of an R package to S+Finmetrics and I notice that the covariance matrix outputted by the two procedures is different. The R package computes the covariance matrix using Method 1 and I think ( but I'm not sure ) that S+Finmetrics computes it using Method 2. I put in a correctionfactor (see below ) in to Method 2 in order to deal with the fact that the var function calculates the unnbiased estimate of variance. This gives the same answer in both problems for the data shown which I just made up. But, my hope is that , for much larger problems, leaving the correction factor out can maybe cause huge differences in the determinant ? That would explain why some of the output ( AIC, BIC ) differs in the two packages. I'm not really sure how to check this easily but if someone has an idea, it would be appreciated. Basically, I kind of want to run some sort of simulation along the lines of below to check whether this could be the reason for the differences. Thanks. x-c(11,12,13,14,16) y-c(2,4,6,8,12) z-c(14,18,22,50,20) LHS-cbind(x,y) sample-nrow(lhs) # METHOD1 output1-lm(LHS ~ z) resids-resid(output1) sigma.hat1-crossprod(resids)/sample print(sigma.hat1) print(det(sigma.hat1)) # METHOD2 fit1-lm(LHS[,1] ~ z) fit2-lm(LHS[,2] ~ z) correctionfactor-sample-1/sample sigma.hat2-correctionfactor*var(cbind(resid(fit1),resid(fit2))) print(sigma.hat2) print(det(sigma.hat2)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] expm() within the Matrix package
Laura Hill wrote: Hi Could anybody give me a bit of advice on some code I'm having trouble with? I've been trying to calculate the loglikelihood of a function iterated over a data of time values and I seem to be experiencing difficulty when I use the function expm(). Here's an example of what I am trying to do y-c(5,10) #vector of 2 survival times p-Matrix(c(1,0),1,2) #1x2 matrix Q-Matrix(c(1,2,3,4),2,2) #2x2 matrix q-Matrix(c(1,2),2,1) #2x1 matrix Loglik-rep(0,length(y))#creating empty vector of same length as y for(i in 1:length(y)){ Loglik[i]-log((p %*% (expm(Q*y[i]))) %*% q) #calculating # Loglikelihood for each y[i] } The code works perfectly if I use exp(Q*y[i]) but not for expm() You need to do a type conversion here, because you get the loglik as a 1x1 Matrix, instead of a scalar: (after running your code) log(p %*% expm(Q * y[i]) %*% q) 1 x 1 Matrix of class dgeMatrix [,1] [1,] 134.5565 If you convert to numeric, you can then assign it to Loglik: Loglik[1] - as.numeric(log(p %*% expm(Q * y[i]) %*% q)) Loglik[1] [1] 134.5565 -- Gad Abraham Department of Mathematics and Statistics The University of Melbourne Parkville 3010, Victoria, Australia email: [EMAIL PROTECTED] web: http://www.ms.unimelb.edu.au/~gabraham __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] replacing all NA's in a dataframe with zeros...
Thanks, one and all... I knew it had to be simple. On 3/14/07, Jason Barnhart [EMAIL PROTECTED] wrote: This should work. test.df - data.frame(x1=c(NA,2,3,NA), x2=c(1,2,3,4), x3=c(1,NA,NA,4)) test.df x1 x2 x3 1 NA 1 1 2 2 2 NA 3 3 3 NA 4 NA 4 4 test.df[is.na(test.df)] - 1000 test.df x1 x2 x3 1 1000 11 22 2 1000 33 3 1000 4 1000 44 The following search string cran r replace data.frame NA in Google (as US user) yielded some good results (5th and 7th entry), but there was another example that explicitly yielded this technique. I can't seem to recall my exact search string. - Original Message - From: David L. Van Brunt, Ph.D. [EMAIL PROTECTED] To: R-Help List r-help@stat.math.ethz.ch Sent: Wednesday, March 14, 2007 5:22 PM Subject: [R] replacing all NA's in a dataframe with zeros... I've seen how to replace the NA's in a single column with a data frame * mydata$ncigs[is.na(mydata$ncigs)]-0 *But this is just one column... I have thousands of columns (!) that I need to do this, and I can't figure out a way, outside of the dreaded loop, do replace all NA's in an entire data frame (all vars) without naming each var separately. Yikes. I'm racking my brain on this, seems like I must be staring at the obvious, but it eludes me. Searches have come up CLOSE, but not quite what I need.. Any pointers? -- --- David L. Van Brunt, Ph.D. mailto:[EMAIL PROTECTED] If Tyranny and Oppression come to this land, it will be in the guise of fighting a foreign enemy. --James Madison [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- --- David L. Van Brunt, Ph.D. mailto:[EMAIL PROTECTED] If Tyranny and Oppression come to this land, it will be in the guise of fighting a foreign enemy. --James Madison [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple scores per subject
Hi, I have a data set that looks like this: data vara varb S PC 1 None 250 1 80 2 None 250 1 70 3 Some 250 1 60 4 Some 250 1 70 5 None 1000 1 90 6 None 1000 1 90 7 Some 1000 1 80 8 Some 1000 1 70 9 None 250 2 100 10 None 250 2 80 11 Some 250 2 70 12 Some 250 2 70 13 None 1000 2 100 14 None 1000 2 90 15 Some 1000 2 50 16 Some 1000 2 40 ... And so on. The last column is the dependent variable, and I have made the other columns factors. As you can see, there are multiple scores for each subject in each combination of conditions. How can I reduce the dataset so that there is only 1 score per subject, per condition, for further analysis? I can use tapply to get means, but I need a data.frame for analysis (aov). Any ideas? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to determine the relative distance between a DNA sequence and know genes
Hi, recently, I got a bunch of DNA sequences (with chromosome coordinate for each sequence), I would like to know the relative distance between these sequences and all the genes (genomic sequences) on human genome, i.e., are these sequences locate at upstream of the genes(5 prime end), downstream of the genes(3 prime end) or within the genes. I have about 8000 sequences. Any package, methods could help me? thanks. -zrl [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple scores per subject
Does this do what you want? x - vara varb S PC + 1 None 250 1 80 + 2 None 250 1 70 + 3 Some 250 1 60 + 4 Some 250 1 70 + 5 None 1000 1 90 + 6 None 1000 1 90 + 7 Some 1000 1 80 + 8 Some 1000 1 70 + 9 None 250 2 100 + 10 None 250 2 80 + 11 Some 250 2 70 + 12 Some 250 2 70 + 13 None 1000 2 100 + 14 None 1000 2 90 + 15 Some 1000 2 50 + 16 Some 1000 2 40 x.in - read.table(textConnection(x), header=TRUE) (x.agg - aggregate(x.in$PC, list(vara=x.in$vara, varb=x.in$varb, S= x.in$S), mean)) vara varb S x 1 None 250 1 75 2 Some 250 1 65 3 None 1000 1 90 4 Some 1000 1 75 5 None 250 2 90 6 Some 250 2 70 7 None 1000 2 95 8 Some 1000 2 45 On 3/15/07, Christopher Brown [EMAIL PROTECTED] wrote: Hi, I have a data set that looks like this: data vara varb S PC 1 None 250 1 80 2 None 250 1 70 3 Some 250 1 60 4 Some 250 1 70 5 None 1000 1 90 6 None 1000 1 90 7 Some 1000 1 80 8 Some 1000 1 70 9 None 250 2 100 10 None 250 2 80 11 Some 250 2 70 12 Some 250 2 70 13 None 1000 2 100 14 None 1000 2 90 15 Some 1000 2 50 16 Some 1000 2 40 ... And so on. The last column is the dependent variable, and I have made the other columns factors. As you can see, there are multiple scores for each subject in each combination of conditions. How can I reduce the dataset so that there is only 1 score per subject, per condition, for further analysis? I can use tapply to get means, but I need a data.frame for analysis (aov). Any ideas? -- Chris __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MANOVA permutation testing
Hi, I've got a dataset with 7 variables for 8 different species. I'd like to test the null hypothesis of no difference among species for these variables. MANOVA seems like the appropriate test, but since I'm unsure of how well the data fit the assumptions of equal variance/covariance and multivariate normality, I want to use a permutation test. I've been through CRAN looking at packages boot, bootstrap, coin, permtest, but they all seem to be doing more than I need. Is the following code an appropriate way to test my hypothesis: result.vect - c() for (i in 1:1000){ wilks - summary.manova(manova(maxent~sample(max.spec)), test=Wilks)$stats[1,2] result.vect - c(res.vect,wilks) } maxent is the data, max.spec is a vector of species names. Comparing the result.vect with the wilks value for the unpermuted data suggests there are very significant differences among species -- but did I do this properly? -- Regards, Tyler Smith __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I place axis annotations away from axis?
Hello Experts I have the following codes and data for 2 interpolation plots. http://www.nabble.com/file/7206/3d_plot_data.txt 3d_plot_data.txt data-read.table(3d_plot_data.txt, header=T) attach(data) par(mfrow=c(1,2)) library(akima) interpolation-interp(rr,veg_r,predict) persp(interpolation,theta = -45, phi = 30, ticktype = detailed, nticks=4, cex=0.8, expand=0.5, xlab=\n\n\nPrecipitation, yla=\n\n\nVegetation, zlab=\n\n\nDensity, shade=0.4) interpolation-interp(tc,veg_r,predict, duplicate=mean) persp(interpolation,theta = -45, phi = 30, ticktype = detailed, nticks=4, cex=0.8, expand=0.5, xlab=\n\n\nTemperature, yla=\n\n\nVegetation, zlab=\n\n\nDensity, shade=0.4) Now as you can see, and when exported as eps, axis annotation by tickmarks are overlapping with Z axis. As it's for publication, font should be this big. I could put the axis labels away from the axes, but connot find how to place axis annotations further from the axes, or if it's ever possible to adjust the distance between axis and axis annotation. Your help is much appreciated! -- View this message in context: http://www.nabble.com/How-can-I-place-axis-annotations-away-from-axis--tf3412293.html#a9507709 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ARIMA standard error
Hi, Can anyone explain how the standard error in arima() is calculated? Also, how can I extract it from the Arima object? I don't see it in there. x - rnorm(1000) a - arima(x, order = c(4, 0, 0)) a Call: arima(x = x, order = c(4, 0, 0)) Coefficients: ar1 ar2 ar3 ar4 intercept -0.0451 0.0448 0.0139 -0.0688 0.0010 s.e. 0.0316 0.0316 0.0317 0.0316 0.0296 sigma^2 estimated as 0.9775: log likelihood = -1407.56, aic = 2827.12 names(a) [1] coef sigma2var.coef mask loglikaic arma residuals call seriescode n.condmodel Thanks, Gad -- Gad Abraham Department of Mathematics and Statistics The University of Melbourne Parkville 3010, Victoria, Australia email: [EMAIL PROTECTED] web: http://www.ms.unimelb.edu.au/~gabraham __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ARIMA standard error
Hi Gad, try: class(a) [1] Arima getAnywhere(print.Arima) ... Cheers, Andrew On Fri, Mar 16, 2007 at 01:47:25PM +1100, Gad Abraham wrote: Hi, Can anyone explain how the standard error in arima() is calculated? Also, how can I extract it from the Arima object? I don't see it in there. x - rnorm(1000) a - arima(x, order = c(4, 0, 0)) a Call: arima(x = x, order = c(4, 0, 0)) Coefficients: ar1 ar2 ar3 ar4 intercept -0.0451 0.0448 0.0139 -0.0688 0.0010 s.e. 0.0316 0.0316 0.0317 0.0316 0.0296 sigma^2 estimated as 0.9775: log likelihood = -1407.56, aic = 2827.12 names(a) [1] coef sigma2var.coef mask loglikaic arma residuals call seriescode n.condmodel Thanks, Gad -- Gad Abraham Department of Mathematics and Statistics The University of Melbourne Parkville 3010, Victoria, Australia email: [EMAIL PROTECTED] web: http://www.ms.unimelb.edu.au/~gabraham __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ARIMA standard error
Andrew Robinson wrote: Hi Gad, try: class(a) [1] Arima getAnywhere(print.Arima) Thanks Andrew. For the record, the standard error is the square root of the diagonal of the covariance matrix a$var.coef (itself obtained through some magic): ses[x$mask] - round(sqrt(diag(x$var.coef)), digits = digits) Cheers, Gad -- Gad Abraham Department of Mathematics and Statistics The University of Melbourne Parkville 3010, Victoria, Australia email: [EMAIL PROTECTED] web: http://www.ms.unimelb.edu.au/~gabraham __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Complex superscript plot text
Hi all: I would like to create a line of plot margin text (using mtext() ) that features both a superscript and a subset of an object. However, I cannot seem to do both things at once, nor have I found a way to paste the two results together. (I pull the object subset because this is part of a for-next loop to make multiple graphs) This example doesn't give me what I want from mtext(): GoodnessOfFits-c(1, 0.75) graphNumber-1 #first loop of the for-next loop (not shown) x-seq(-10, 10, 1) y-(x^2) plot(x,y) lines(x, predict(lm(y~I(x^2 mtext(text= expression(R^2 * : * GoodnessOfFits[graphNumber])) I recognize that in this example, I could extract the R-squared value from the model in each loop, however this does not apply to my more complicated real scenario. Any suggestions? Thanks. --Bob Farmer __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error code 5 from Lapack routine 'dsyevr'
While using the rmvnorm function, I get the error: Error in eigen(sigma, sym = TRUE) : error code 5 from Lapack routine 'dsyevr' The same thing happens when I try the eigen() function on my covariance matrix. The matrix is a symmetric 111x111 matrix. Well, it is almost symmetric; there are slight deviations from symmetry (the largest is 3e-18). I have this in an MCMC loop, and it happens about once in every 40 iterations or so. What does this error mean? Thanks. -- Richard D. Morey, M.A. Research Assistant, Perception and Cognition Lab University of Missouri-Columbia __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.