Re: [R] Finding distance matrix for categorical data
Thanks Guys , I am able to generate the distance matrix for mixed column values ( categorical and ordinal ) using daisy function But can anyone tell me how to generate clusters out of it , The point being i dont know the number of cluster beforehand Let me give an overview of the problem i am trying to solve is Given a dataset , something like below var1 var2 var3 Size element1-1 yesx present 100 element1-2 no y absent294 element1-3 maybe x absent45 The first 3 variables being categorical and last one being ordinal I need to do the following 1 ) Generate clusters out of it ( let say they are training clusters ) I am able to compute distance matrix ( using daisy ) , but not sure how to create unknown numbers of clusters , dbscan work on a distance matrix 2 ) Once that is done i want to spread some new data points in the above plot space ( lets say these are test points ) 3) Find out which test points are lying within a boundary of any above discovered training clusters If anyone know how to get this done then please let me know Its for an academic project and i am unable to make any progress Thanks and Regards K From: Ingmar Visser i.vis...@uva.nl Sent: Fri, 11 June, 2010 2:19:33 PM Subject: Re: [R] Finding distance matrix for categorical data latent class analysis may be more appropriate depending on your hypotheses, best, Ingmar e: All, How can we find a distance matrix for categorical data ie. given a csv below var1 var2var3var4 element1-1 yesx a k element1-2 no y b l element1-3 maybe y c m how can i compute the distance matrix between all the elements Actually i need it to create clusters on top of it Thanks Regards Kapil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cforest and Random Forest memory use
Answers added below. Thanks again, Matt On 11 June 2010 14:28, Max Kuhn mxk...@gmail.com wrote: Also, you have not said: - your OS: Windows Server 2003 64-bit - your version of R: 2.11.1 64-bit - your version of party: 0.9-9995 - your code: test.cf -(formula=badflag~.,data = example,control=cforest_control (teststat = 'max', testtype = 'Teststatistic', replace = FALSE, ntree = 500, savesplitstats = FALSE,mtry = 10)) - what Large data set means: 1 million observations, 40+ variables, around 200MB - what very large model objects means - anything which breaks So... how is anyone suppose to help you? Max [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer() with no intercept
John, Why would you want to fit the model without intercept if you seemingly need it? Anyway, I assume that the intercept from your first model just moves into the random effects -- you have intercepts there for worker and day, so any of these (or both) will absorb it. No surprise that the estimates for the covariates only differ slightly, it should be that way. What you plot (your second call to panel.lines) is not the correct model, as you omit the intercept from the work and day (which is 0 or at least pretty close to it if you include the overall intercept in your model). That's why your red line is on the top edge (note that the intercept is negative). I'm therefore not sure that the model without intercept makes a lot of sense, but you consider posting related questions rather to the mixed-models SIG, where you might get more erudite comments than from me. HTH, Michael -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of array chip Sent: Saturday, June 12, 2010 1:07 To: r-help@r-project.org Subject: [R] lmer() with no intercept Hi, I asked this before, but haven't got any response. So would like to have another try. thanks for help. Also tried twice to join the model mailing list so that I can ask question there, but still haven't got permission to join that list yet. === Hi, I am wondering how I can specify no intercept in a mixed model using lmer(). Here is an example dataset attached (test.txt). There are 3 workers, in 5 days, measured a response variable y on independent variable x. I want to use a quadratic term (x2 in the dataset) to model the relationship between y and x. test-read.table(test.txt,sep='\t',header=T) If I just simply use lm() and ignore worker and day, so that I can try both a linear regression with and without an intercept, here is what I get: lm(y~x+x2, data=test) Coefficients: (Intercept)x x2 -1.77491040.1099160 -0.0006152 lm(y~x+x2-1, data=test) Coefficients: x x2 0.0490097 -0.0001962 Now, I want to try mixed model considering worker and day as random effect. With an intercept: lmer(y~x+x2+(1|worker)+(1|day), data=test) Fixed effects: Estimate Std. Error t value (Intercept) -1.324e+00 4.490e-01 -2.948 x1.117e-01 8.563e-03 13.041 x2 -6.357e-04 7.822e-05 -8.127 Without an intercept: lmer(y~x+x2+(1|worker)+(1|day)-1, data=test) Fixed effects: Estimate Std. Error t value x 1.107e-01 8.528e-03 12.981 x2 -6.304e-04 7.805e-05 -8.077 It seems working fine. But if you look at the fixed effect coefficients of both mixed models, the coefficients for x and x2 are not much different, regardless of whether an intercept is included or not. This is not the case for simple linear regression using lm() on the top. If I plot all 4 models in the following plot: xyplot(y~x,groups=worker,test, col.line = grey, lwd = 2, , panel = function(x,y) { panel.xyplot(x,y, type='p') x-sort(x) panel.lines(x,-1.324+0.1117*x-0.0006357*x*x) panel.lines(x,0.1107*x-0.0006304*x*x,col='red') panel.lines(x,0.04901*x-0.0001962*x*x,col='blue') panel.lines(x,-1.7749+0.10992*x-0.0006152*x*x,col='green') }) As you can see, the mixed model without intercept (red line) does not fit the data very well (it's at the top edge of the data, instead of in the middle of the data), so I guess I did something wrong here. Can anyone make any suggestions? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sharing experience - installing R Spatial Views
On Sat, Jun 12, 2010 at 2:37 AM, Hendro Wibowo hendrohwib...@gmail.comwrote: Hi Guys, I would like to share my experience when installing the Spatial views packages for R. I could not install 32 packages which are parts of the Spatial views, and I use google-search and search to solve ALL those problems for about 2 days. I hope maybe somebody would benefit from my experience. I admitted that I do not have excellent programming skills at all. So, perhaps some of steps I did are not necessary to solve the problem at hands. But it works and I am happy. I was lucky if the problem I have was already solved by someone else, or if the problem is just a matter of installing another R packages that depends on other packages. Sometimes it has something to do not with R package but with a package that has to be installed in my operating system (in my case LinuxMint 8, based on ubuntu karmic). so I just guessed the nature of the problem, and try to solve it via already solved similar problem. i found many problems solved direct or indirect from Nabble - OSGeo FOSS4G websites. of course many problem are also solved from other website other than Nabble, but I forgot their address, as I google search again and again for all those problems. But I do appreciate everyone who post their problems, solved or not solved, so I can solved my own problem, either directly or indirectly (by using the idea and guessing from already solved problem). I found that we need to change from gcc version 4 to 3, in order to some packages to be installed (I forgot which one). The script i used to achieve that, taken from the internet, is placed at the bottom of this message. and finally below here are lists of problems and solutions of installing Spatial Views. . . . SNIP The main complication here seems to be that packages have (obviously) dependencies and that especially Task View packages have MANY dependencies which can not be resolved by R but on the operating system (Linux and Mac - Windows is a different story) level. To make the installation easier, would it be possible to put these dependencies on the website of the task views, and possibly even provide commands for the most widely used Linux distros on how to install them? This could save quite some time in installing those. Cheers, Rainer -- NEW GERMAN FAX NUMBER!!! Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Natural Sciences Building Office Suite 2039 Stellenbosch University Main Campus, Merriman Avenue Stellenbosch South Africa Cell: +27 - (0)83 9479 042 Fax:+27 - (0)86 516 2782 Fax:+49 - (0)321 2125 2244 email: rai...@krugs.de Skype: RMkrug Google: r.m.k...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] random colour
Sir, I want to plot 5 curve on a single graph. I want to give random colour on it. How can I do this? Regards, Suman Dhara [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using latticeExtra plotting confidence intervals
So if the categories was race (A and B) and I had a male and female nested in each group, that would give me 4 different data points but I want two data points within each panel. Joe King 206-913-2912 j...@joepking.com Never throughout history has a man who lived a life of ease left a name worth remembering. --Theodore Roosevelt -Original Message- From: Deepayan Sarkar [mailto:deepayan.sar...@gmail.com] Sent: Sunday, June 13, 2010 10:15 PM To: Joe P King Cc: r-help@r-project.org Subject: Re: [R] using latticeExtra plotting confidence intervals On Sun, Jun 13, 2010 at 10:10 AM, Joe P King j...@joepking.com wrote: I am wanting to plot a 95% confidence band using segplot, yet I am wanting to have groups. For example if I have males and females, and then I have them in different races, I want the racial groups in different panels. I have this minor code, completely made up but gets at what I am wanting, 4 random samples and 4 samples of confidence, I know how to get A B into one panel and CD in to another but how do I get the x axis to label them properly and have it categorized as two. I am not sure what to have to the left side of the formula. This is the example code: (1) Your code results in length(categories) [1] 2 length(mu) [1] 4 which makes the formula mu~ci.upper+ci.lower|categories meaningless. (2) You are effectively plotting xyplot(mu ~ mu | categories), with additional confidence intervals in one direction. I'm sure that's not what you want, but it's not clear what it is that you do actually want. -Deepayan library(lattice) library(latticeExtra) sample1-rnorm(100,10,2) sample2-rnorm(100,50,3) sample3-rnorm(100,20,2) sample4-rnorm(100,40,1) mu1-mean(sample1) ci.upper1-mu1+2*2 ci.lower1-mu1-2*2 mu2-mean(sample2) ci.upper2-mu2+2*3 ci.lower2-mu2-2*3 mu3-mean(sample3) ci.upper3-mu3+2*2 ci.lower3-mu3-2*2 mu4-mean(sample4) ci.upper4-mu4+2*1 ci.lower4-mu4-2*1 categories-c(A,B) mu-cbind(mu1,mu2,mu3,mu4) ci.upper-cbind(ci.upper1,ci.upper2,ci.upper3,ci.upper4) ci.lower-cbind(ci.lower1,ci.lower2,ci.lower3,ci.lower4) segplot(mu~ci.upper+ci.lower|categories, centers = mu, horizontal=FALSE) I also tried this seq1-seq(1,4,1) segplot(seq1~ci.upper+ci.lower|categories, centers = mu,horizontal=FALSE) but it also gives poor x axis, I know this is probably an elementary problem but any help would be greatly appreciated. Heres my data structure, sorry for bombarding with all the code. structure(c(9.85647167881417, 50.1856561919426, 19.8477661576365, 39.8575819498655, 13.8564716788142, 56.1856561919426, 23.8477661576365, 41.8575819498655, 5.85647167881417, 44.1856561919426, 15.8477661576365, 37.8575819498655), .Dim = c(1L, 12L), .Dimnames = list(NULL, c(mu1, mu2, mu3, mu4, ci.upper1, ci.upper2, ci.upper3, ci.upper4, ci.lower1, ci.lower2, ci.lower3, ci.lower4 ))) --- Joe King, M.A. Ph.D. Student University of Washington - Seattle Office: 404A Miller Hall 206-913-2912 mailto:j...@joepking.com j...@joepking.com --- Never throughout history has a man who lived a life of ease left a name worth remembering. --Theodore Roosevelt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: random colour
Hi Would you like to know which colour is for which curve? If yes use parameter col=1:5 in your plotting call. If not you can try col = sample(colours(), 5) instead Regards Petr r-help-boun...@r-project.org napsal dne 14.06.2010 07:05:43: Sir, I want to plot 5 curve on a single graph. I want to give random colour on it. How can I do this? Regards, Suman Dhara [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] meta analysis with repeated measure-designs?
Hi, thanks for the references I will try the sensitivitiy-analysis in R and try out winbugs if that does not work (little afraid of switching programmes). I also had an idea for a reasonable estimate of the correlations. Some studies report both results from paired t-tests and means and SDs, and thus allow to calculate two estimates for d one based on M and SD alone the other on t. The difference between the two estimates should be systematically related to the correlations of measures. I will keep you posted, if I have a solution or hit a wall. efachristo and dank je wel! Gerrit On 12.06.2010, at 15:59, Viechtbauer Wolfgang (STAT) wrote: Dear Gerrit, the most appropriate approach for data of this type would be a proper multivariate meta-analytic model (along the lines of Kalaian Raudenbush, 1996). Since you do not know the correlations of the reaction time measurements across conditions for the within-subject designs, a simple solution is to guestimate those correlations and then conduct sensitivity analyses to make sure your conclusions do not depend on those guestimates. Best, -- Wolfgang Viechtbauerhttp://www.wvbauer.com/ Department of Methodology and StatisticsTel: +31 (0)43 388-2277 School for Public Health and Primary Care Office Location: Maastricht University, P.O. Box 616 Room B2.01 (second floor) 6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck) Original Message From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Gerrit Hirschfeld Sent: Saturday, June 12, 2010 12:45 To: r-help@r-project.org Subject: [R] meta analysis with repeated measure-designs? Dear all, I am trying to run a meta analysis of psycholinguistic reaction-time experiments with the meta package. The problem is that most of the studies have a within-subject designs and use repeated measures ANOVAs to analyze their data. So at present it seems that there are three non-optimal ways to run the analysis. 1. Using metacont() to estimate effect sizes and standard errors. But as the different sores are dependent this would result in biased estimators (Dunlap, 1996). Suppose I had the correlations of the measures (which I do not) would there by an option to use them in metacont() ? 2. Use metagen() with an effect size that is based on the reported F for the contrasts but has other disadvantages (Bakeman, 2005). The problem I am having with this is that I could not find a formular to compute the standard error of partial eta squared. Any Ideas? 3. Use metagen() with r computed from p-values (Rosenthal, 1994) as effect size with the problem that sample-size affects p as much as effect size. Is there a fourth way, or data showing that correlations can be neglected as long as they are assumed to be similar in the studies? Any ideas are much apprecciated. best regards Gerrit __ Gerrit Hirschfeld, Dipl.-Psych. Psychologisches Institut II Westfälische Wilhelms-Universität Fliednerstr. 21 48149 Münster Germany psycholinguistics.uni-muenster.de GerritHirschfeld.de Fon.: +49 (0) 251 83-31378 Fon.: +49 (0) 234 7960728 Fax.: +49 (0) 251 83-34104 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ Gerrit Hirschfeld, Dipl.-Psych. Psychologisches Institut II Westfälische Wilhelms-Universität Fliednerstr. 21 48149 Münster Germany psycholinguistics.uni-muenster.de GerritHirschfeld.de Fon.: +49 (0) 251 83-31378 Fon.: +49 (0) 234 7960728 Fax.: +49 (0) 251 83-34104 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to convert data frame row to vector?
Hi, I have a data frame that looks like this: ID Var Var2 Var3 xxx 100 909 920 yyy 110 720 710 zzz 140 680 690 I can load the file and produce a plot from values from one column. data - read.csv(test.csv) barplot(data$Var2) So far it's fine. The data frame however has more columns with integer values in reality and I want to plot from consecutive values of the SAME ROW. Something like: barplot(data[1, 2:3]) or more generic: barplot(data[row.id, from.field:to.field]) Unfortunately this no longer is a vector! I understand that it could contain non-numeric values which in my data frame is not the case. How can I convert a horizontal one-line-data-frame into a vector? Any help appreciated, best regards, Jörg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random colour
On 06/14/2010 03:05 PM, suman dhara wrote: Sir, I want to plot 5 curve on a single graph. I want to give random colour on it. How can I do this? Hi Suman, col-rgb(runif(5),runif(5),runif(5)) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] discrete (binary) choice panel data
Dear all, I haven´t used R for panel data analysis so far, but now I am looking for a package respectively some starting point for binary choice panel data analysis in R. For starters most of my effects are individual and my dependent variable is just binary. Thanks for any suggestions in advance! best matt __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: How to convert data frame row to vector?
Hi r-help-boun...@r-project.org napsal dne 14.06.2010 11:05:39: Hi, I have a data frame that looks like this: ID Var Var2 Var3 xxx 100 909 920 yyy 110 720 710 zzz 140 680 690 I can load the file and produce a plot from values from one column. data - read.csv(test.csv) barplot(data$Var2) So far it's fine. The data frame however has more columns with integer values in reality and I want to plot from consecutive values of the SAME ROW. Something like: barplot(data[1, 2:3]) or more generic: barplot(data[row.id, from.field:to.field]) E.g. unlist barplot(unlist(data[row.id, from.field:to.field])) Regards Petr Unfortunately this no longer is a vector! I understand that it could contain non-numeric values which in my data frame is not the case. How can I convert a horizontal one-line-data-frame into a vector? Any help appreciated, best regards, Jörg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clustering algorithms don't find obvious clusters
Thank you Etienne, this seems to work like a charm. Also thanks to the rest of you for your help. Henrik On 11 June 2010 13:51, Cuvelier Etienne ecuscim...@gmail.com wrote: Le 11/06/2010 12:45, Henrik Aldberg a écrit : I have a directed graph which is represented as a matrix on the form 0 4 0 1 6 0 0 0 0 1 0 5 0 0 4 0 Each row correspond to an author (A, B, C, D) and the values says how many times this author have cited the other authors. Hence the first row says that author A have cited author B four times and author D one time. Thus the matrix represents two groups of authors: (A,B) and (C,D) who cites each other. But there is also a weak link between the groups. In reality this matrix is much bigger and very sparce but it still consists of distinct groups of authors. My problem is that when I cluster the matrix using pam, clara or agnes the algorithms does not find the obvious clusters. I have tried to turn it into a dissimilarity matrix before clustering but that did not help either. The layout of the clustering is not that important to me, my primary interest is the to get the right nodes into the right clusters. Hello Henrik, You can use a graph clustering using the igraph package. Example: library(igraph) simM-NULL simM-rbind(simM,c(0, 4, 0, 1)) simM-rbind(simM,c(6, 0, 0, 0)) simM-rbind(simM,c(0, 1, 0, 5)) simM-rbind(simM,c(0, 0, 4, 0)) G - graph.adjacency( simM,weighted=TRUE,mode=directed) plot(G,layout=layout.kamada.kawai) ### walktrap.community wt - walktrap.community(G, modularity=TRUE) wmemb - community.to.membership(G, wt$merges, steps=which.max(wt$modularity)-1) V(G)$color - rainbow(3)[wmemb$membership+1] plot(G) I hope it helps Etienne Sincerely Henrik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Install Rmpi
Hi everyone As I couldn't succeed with manual installation of Rmpi I decided to start again from the beginning. I removed R and MPICH in my Ubuntu Hardy installation. Then, to avoid any dependencies problems I have installed MPICH and R from synaptic, not from sources. But now I can't install Rmpi. An error message appears when trying to install Rmpi, you can find in http://ubuntuone.com/p/71x/ Is it the time to upgrade to the latest Ubuntu version and build a new system? Any help would be greatly appreciated. -- --- Francisco Pastor Meteorology department Fundación CEAM p...@ceam.es http://www.ceam.es/ceamet - http://www.ceam.es Parque Tecnologico, C/ Charles R. Darwin, 14 46980 PATERNA (Valencia), Spain Tlf. 96 131 82 27 - Fax. 96 131 81 90 --- Usuario Linux registrado: 363952 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Install Rmpi
On Mon, Jun 14, 2010 at 01:12:05PM +0200, Paco Pastor wrote: Hi everyone As I couldn't succeed with manual installation of Rmpi I decided to start again from the beginning. I removed R and MPICH in my Ubuntu Hardy installation. Then, to avoid any dependencies problems I have installed MPICH and R from synaptic, not from sources. But now I can't install Rmpi. All you need is sudo apt-get install r-cran-rmpi but that will reply on Open MPI. Any reason you need MPICH? Dirk An error message appears when trying to install Rmpi, you can find in http://ubuntuone.com/p/71x/ Is it the time to upgrade to the latest Ubuntu version and build a new system? Any help would be greatly appreciated. -- --- Francisco Pastor Meteorology department Fundación CEAM p...@ceam.es http://www.ceam.es/ceamet - http://www.ceam.es Parque Tecnologico, C/ Charles R. Darwin, 14 46980 PATERNA (Valencia), Spain Tlf. 96 131 82 27 - Fax. 96 131 81 90 --- Usuario Linux registrado: 363952 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Three out of two people have difficulties with fractions. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding an order for an hclust (dendrogram) object without intersections
o.k, I found a working solution for this. If anyone in the future will face the problem, here is something that works (Although it is probably not the best solution out there) Best, Tal #-- problematic.tree - structure(list(merge = structure(c(-3, -24, 1, -25, 4, -27, 5, -22, 8, 3, 9, -5, 10, 11, 12, 6, 14, 7, 18, 19, 20, 21, 22, 23, 24, 25, -13, -23, -14, -21, -20, -26, -19, -1, 2, -15, -7, -4, -9, -18, -6, -17, -12, 17, -2, -8, -10, -16, 13, 15, 16, -11), .Dim = c(26L, 2L)), height = c(0.0833, 0.0867, 0.117, 0.136507936507937, 0.220634920634921, 0.622, 0.674603174603175, 0.823, 1.06349206349206, 1.27698412698413, 1.37, 2.00952380952381, 2.2975, 2.39, 2.686667, 2.9, 3.14736842105263, 3.55634920634921, 3.7921768707483, 3.84183673469388, 3.93817373103087, 4.54464285714286, 4.81438464274599, 5.10895156778615, 5.36142237562854, 6.19122779197967), order = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 11L), labels = 1:27), .Names = c(merge, height, order, labels), class = hclust) plot(problematic.tree ) order.a.tree - function(tree) { num.of.leafs - length(tree$order) matrix.to.order - NULL for(i in 1:(num.of.leafs)) { matrix.to.order - cbind(matrix.to.order , cutree(tree, k = i)) } id - seq_len(dim(matrix.to.order)[1]) matrix.to.order - data.frame(id, matrix.to.order) print(matrix.to.order) require(doBy) # The ordering engine of this function is based on orderBy from the doBy package ordered.matrix - eval(parse(text = paste(orderBy(~, paste(colnames(matrix.to.order)[-1], collapse = +), , data=matrix.to.order print(ordered.matrix) tree$order - (ordered.matrix$id) return(tree) } plot(order.a.tree(problematic.tree)) #-- Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sun, Jun 13, 2010 at 8:42 PM, Tal Galili tal.gal...@gmail.com wrote: o.k, I found an example where my algorithm can't fix the tree order. But I don't know how to resolve it. Here is the code to reproduce the problem: # order.a.tree - function(tree) { num.of.leafs - length(tree$order) for(i in 1:(num.of.leafs)) { tree$order - order( cutree(tree, k = i)) } return(tree) } problematic.tree - structure(list(merge = structure(c(-3, -24, 1, -25, 4, -27, 5, -22, 8, 3, 9, -5, 10, 11, 12, 6, 14, 7, 18, 19, 20, 21, 22, 23, 24, 25, -13, -23, -14, -21, -20, -26, -19, -1, 2, -15, -7, -4, -9, -18, -6, -17, -12, 17, -2, -8, -10, -16, 13, 15, 16, -11), .Dim = c(26L, 2L)), height = c(0.0833, 0.0867, 0.117, 0.136507936507937, 0.220634920634921, 0.622, 0.674603174603175, 0.823, 1.06349206349206, 1.27698412698413, 1.37, 2.00952380952381, 2.2975, 2.39, 2.686667, 2.9, 3.14736842105263, 3.55634920634921, 3.7921768707483, 3.84183673469388, 3.93817373103087, 4.54464285714286, 4.81438464274599, 5.10895156778615, 5.36142237562854, 6.19122779197967), order = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 11L), labels = 1:27), .Names = c(merge, height, order, labels), class = hclust) plot(order.a.tree(problematic.tree )) # Thanks, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sun, Jun 13, 2010 at 7:39 PM, Tal Galili tal.gal...@gmail.com wrote: Thanks Charles. In the meantime, I found out the following code does the trick. But I am wondering if: 1) I might have made a mistake in it somewhere 2) If there are other (smarter) ways of going about this. Here is the solution I wrote: # - order.a.tree - function(tree) { num.of.leafs - length(tree$order) for(i in 2:(num.of.leafs-1)) { tree$order - order( cutree(tree, k = i)) } return(tree) } #Example: a - list() # initialize empty object # define merging pattern: #negative numbers are leaves, #positive are merged clusters (defined by row number in $merge) a$merge - matrix(c(-1, -2, -3, -4,
[R] Html help
I have just installed R 2.11.1 on my XP laptop. I like html help for browsing but text help for on-the-fly look-ups. I was a bit surprised when I was asked to choose between them during the installation. I chose text, thinking I could fix the html help later, which is what I am trying to do now. Now when I ask for html help my browser goes to 'http://-ip-number-/doc/html/index.html' instead of where I want on my computer: C:\apps\R\R-2.11.1\doc\html\index.html Now I can go where I want manually but then the package list on C:\apps\R\R-2.11.1\doc\html\packages.html does not include all the packages that I have installed and linked. I don't want to read my html help from the web because sometimes I am off-line or on a slow connection. How do I go about getting a local set of html help files? Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: m...@waikato.ac.nz majorgen...@ihug.co.nzFax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441 Mobile 021 0200 8350 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Large Data
HI, I want to import 1.5G CSV file in R. But the following error comes: 'Victor allocation 12.4 size' How to read the large CSV file in R . Any one can help me? -- View this message in context: http://r.789695.n4.nabble.com/Large-Data-tp2254130p2254130.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Design of experiments for Choice-Based Conjoint Analysis (CBC)
Hello, I would like to know if there is any function in R which allows to make designs of experiments for Choice-Based Conjoint studies ? I have already checked the topic on design of experiments with R and looked at the different libraries. I tried to make my design with the optFedorov function but I haven't found how it can allow to have balanced design (with the same number of case for each level of the factors)... So, if someone has been already in that case, any help would be appreciated ^^ Thanks. A.D. -- View this message in context: http://r.789695.n4.nabble.com/Design-of-experiments-for-Choice-Based-Conjoint-Analysis-CBC-tp2254077p2254077.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Design of experiments for Choice-Based Conjoint Analysis (CBC)
Hello, I would like to know if there is any function in R which allows to make designs of experiments for Choice-Based Conjoint studies ? I have already checked the topic on design of experiments with R and looked at the different libraries. I tried to make my design with the optFedorov function but I haven't found how it can allow to have balanced design (with the same number of case for each level of the factors)... So, if someone has been already in that case, any help would be appreciated ^^ Thanks. A.D. -- View this message in context: http://r.789695.n4.nabble.com/Design-of-experiments-for-Choice-Based-Conjoint-Analysis-CBC-tp2254142p2254142.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to change default help settings from factory default html
Hi all Apologies if this is a trivial question- I have searched the lists and the online help files etc but have not managed to find anything. I recently downloaded the latest version of R, which has the help type set to htmlhelp as default (according to http://127.0.0.1:18380/library/utils/html/help.html) I would very much like to be able to access the help files when I am offline by typing ?topic etc as I used to with the previous R version. Any suggestions? Thanks Katya ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity. ### __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] list matching
Hello, I could not find a clear solution for the follow question. please allow me to ask. thanks mynames=cbind(c('a','b'),c(11,22)) lst=list(a=c(1,2), b=5) now I try to combine mynames and lst: a 1 11 a 2 11 b 5 22 thanks jian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Large Data
And this one is only from last week. Please, read the posting guides carefully. Cheers Joris -- Forwarded message -- From: Joris Meys jorism...@gmail.com Date: Sat, Jun 5, 2010 at 11:04 PM Subject: Re: [R] What is the largest in memory data object you've worked with in R? To: Nathan Stephens nwsteph...@gmail.com Cc: r-help r-help@r-project.org You have to take some things into account : - the maximum memory set for R might not be the maximum memory available - R needs the memory not only for the dataset. Matrix manipulations require frquently double of the amount of memory taken by the dataset. - memory allocation is important when dealing with large datasets. There is plenty of information about that - R has some packages to get around memory problems with big datasets. Read this discussione for example : http://tolstoy.newcastle.edu.au/R/help/05/05/4507.html and this page of Matthew Keller is a good summary too : http://www.matthewckeller.com/html/memory.html Cheers Joris On Sat, Jun 5, 2010 at 12:32 AM, Nathan Stephens nwsteph...@gmail.com wrote: For me, I've found that I can easily work with 1 GB datasets. This includes linear models and aggregations. Working with 5 GB becomes cumbersome. Anything over that, and R croaks. I'm using a dual quad core Dell with 48 GB of RAM. I'm wondering if there is anyone out there running jobs in the 100 GB range. If so, what does your hardware look like? --Nathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php On Mon, Jun 14, 2010 at 12:07 PM, Meenakshi meenakshichidamba...@gmail.com wrote: HI, I want to import 1.5G CSV file in R. But the following error comes: 'Victor allocation 12.4 size' How to read the large CSV file in R . Any one can help me? -- View this message in context: http://r.789695.n4.nabble.com/Large-Data-tp2254130p2254130.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Large Data
http://www.google.com/#hl=ensource=hpq=R+big+data+setsaq=faqi=g1aql=oq=gs_rfai=fp=686584f57664 Cheers Joris On Mon, Jun 14, 2010 at 12:07 PM, Meenakshi meenakshichidamba...@gmail.com wrote: HI, I want to import 1.5G CSV file in R. But the following error comes: 'Victor allocation 12.4 size' How to read the large CSV file in R . Any one can help me? -- View this message in context: http://r.789695.n4.nabble.com/Large-Data-tp2254130p2254130.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list matching
Try this: cbind(mynames[rep(seq(nrow(mynames)), sapply(lst, length)),], unlist(lst)) On Mon, Jun 14, 2010 at 9:06 AM, Yuan Jian jayuan2...@yahoo.com wrote: Hello, I could not find a clear solution for the follow question. please allow me to ask. thanks mynames=cbind(c('a','b'),c(11,22)) lst=list(a=c(1,2), b=5) now I try to combine mynames and lst: a 1 11 a 2 11 b 5 22 thanks jian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ?
Hi R help, Hi R help, Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA: DATA driver cars additive y 1 11A 19 2 21D 23 3 31B 15 4 41C 19 5 12B 24 6 22C 24 7 32D 14 8 42A 18 9 13D 23 10 23A 19 11 33C 15 12 43B 19 13 14C 26 14 24B 30 15 34A 16 16 44D 16 Now summary( aov(MODEL, data=DATA) ) Df Sum Sq Mean Sq F value Pr(F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8. 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 This results differ from book result at p 159, since cars and driver are treated as numerical variables by aov. BRUTE FORCE SOLUTION Manually transforming cars and driver into categorical variables, I obtain the correct result: DATA_AB driver cars additive y 1 D1 C1A 19 2 D2 C1D 23 3 D3 C1B 15 4 D4 C1C 19 5 D1 C2B 24 6 D2 C2C 24 7 D3 C2D 14 8 D4 C2A 18 9 D1 C3D 23 10 D2 C3A 19 11 D3 C3C 15 12 D4 C3B 19 13 D1 C4C 26 14 D2 C4B 30 15 D3 C4A 16 16 D4 C4D 16 summary( aov(MODEL, data=DATA_AB) ) Df Sum Sq Mean Sq F value Pr(F) cars 3 24 8.000 1.5 0.307174 driver 3216 72.00013.5 0.004466 ** additive 3 40 13.333 2.5 0.156490 Residuals6 32 5.333 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 QUESTION Which is the easiest (most elegant) way to force driver and cars from DATA to be treated as categorical variables by aov? More generally, which is the easiest way to force aov to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA: DATA driver cars additive y 1 11A 19 2 21D 23 3 31B 15 4 41C 19 5 12B 24 6 22C 24 7 32D 14 8 42A 18 9 13D 23 10 23A 19 11 33C 15 12 43B 19 13 14C 26 14 24B 30 15 34A 16 16 44D 16 Now summary( aov(MODEL, data=DATA) ) Df Sum Sq Mean Sq F value Pr(F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8. 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 This results differ from book result at p 159, since cars and driver are treated as numerical variables by aov. BRUTE FORCE SOLUTION Manually transforming cars and driver into categorical variables, I obtain the correct result: DATA_AB driver cars additive y 1 D1 C1A 19 2 D2 C1D 23 3 D3 C1B 15 4 D4 C1C 19 5 D1 C2B 24 6 D2 C2C 24 7 D3 C2D 14 8 D4 C2A 18 9 D1 C3D 23 10 D2 C3A 19 11 D3 C3C 15 12 D4 C3B 19 13 D1 C4C 26 14 D2 C4B 30 15 D3 C4A 16 16 D4 C4D 16 summary( aov(MODEL, data=DATA_AB) ) Df Sum Sq Mean Sq F value Pr(F) cars 3 24 8.000 1.5 0.307174 driver 3216 72.00013.5 0.004466 ** additive 3 40 13.333 2.5 0.156490 Residuals6 32 5.333 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 QUESTION Which is the easiest (most elegant) way to
Re: [R] Lattice: How to color the data points in splom() according to the panel they are plotted?
Dear Deepayan, this is in reply to a message almost 6 months ago : Deepayan Sarkar deepayan.sar...@gmail.com on Sun, 17 Jan 2010 01:39:21 -0800 writes: On Sat, Jan 16, 2010 at 11:56 PM, Peter Ehlers ehl...@ucalgary.ca wrote: Marius Hofert wrote: Dear ExpeRts, I have the scatter plot matrix as given below. I would like the different sub-plots in the scatter plot matrix to be colored differently. How do I get all points shown in the upper-left plot (on position (1,1) in the scatter plot matrix) to be plotted in blue, and the points shown in the plot to the right (on position (1,2) in the scatter plot matrix) to be plotted in red? More generally, how can I provide a matrix of colors to be used by splom() such that all data points in the corresponding sub-plot of the scatter plot matrix are shown in the specified color? Cheers, Marius Here is the code: library(lattice) entrymat=matrix(0,nrow=3,ncol=3) entrymat[1,2]=black entrymat[1,3]=blue entrymat[2,3]=red entrymat=t(entrymat) splom(~iris[,1:3],superpanel=function(z,...){ mymat.df=data.frame(rows=as.vector(row(entrymat)),cols=as.vector(col(entrymat)),entries=as.vector(entrymat)) mymat.df=subset(mymat.df,colsrows) with(mymat.df,{ panel.text(x=rows,y=cols,labels=entries) }) panel.pairs(z,upper.panel=panel.splom,lower.panel=function(...){},...) },varnames=c(1,2,3) ) I think that you will have to modify panel.pairs to get what you want. But I must admit that I can't see why you would want such a plot. What's achieved by having different colours in different subpanels? And you would lose the ability to colour groups differently (or things would become really complicated and messy). Thanks, I was going to say the same thing, except that it would be (1) conceptually simpler just to add the 'i' and 'j' values as arguments to the panel function (the 'pargs' variable). The colors could then be passed through as part of the ... arguments, and the relevant entry extracted in the panel function. (2) The other option is to keep a global counter and increment it inside the panel function, and choosing colors based on that counter and knowledge of the order in which panels are drawn. Not very elegant, but the least intrusive solution I can think of. -Deepayan Against the R-forge version of lattice, the following very small patch to panel.pairs would allow users to use '(1)' i.e., provide panel functions with (i,j) arguments directly. I'm pretty sure that the change could not easily break existing splom() usage. --- R/splom.R (revision 619) +++ R/splom.R (working copy) @@ -291,7 +291,8 @@ y = z[subscripts, i]), ##panel.number = panel.number, ##packet.number = packet.number), - list(...)) + list(...), + list(i = i, j = j)) else c(list(x = z[subscripts, j], y = z[subscripts, i], @@ -299,7 +300,8 @@ subscripts = subscripts), ##panel.number = panel.number, ##packet.number = packet.number), - list(...)) + list(...), + list(i = i, j = j)) if (!(... %in% names(formals(panel pargs - pargs[intersect(names(pargs), names(formals(panel)))] With the above change, a user could use a panel function with (i,j) arguments, and e.g. say Cmat - outer(1:6,1:6, function(i,j) rainbow(11, start=.12, end=.5)[i+j-1]) splom(~diag(6), ## for testing: superpanel=mypanel.pairs, panel=function(x,y,i,j,...){ panel.fill(Cmat[i,j]); panel.splom(x,y,...) panel.text(.5,.5, paste((,i,,,j,),sep=)) }) I think that would allow quite a bit more flexibility without the need to explicitly hack panel.pairs (and having to maintain such a hack against the ever-enhancing lattice). Martin Maechler, ETH Zurich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression with 50 varaibales
On Jun 13, 2010, at 10:20 PM, array chip wrote: Hi, this is not R technical question per se. I know there are many excellent statisticians in this list, so here my questions: I have dataset with ~1800 observations and 50 independent variables, so there are about 35 samples per variable. Is it wise to build a stable multiple logistic model with 50 independent variables? Any problem with this approach? Thanks John The general rule of thumb is to have 10-20 'events' per covariate degree of freedom. Frank has suggested that in some cases that number should be as high as 25. The number of events is the smaller of the two possible outcomes for your binary dependent variable. Covariate degrees of freedom refers to the number of columns in the model matrix. Continuous variables are 1, binary factors are 1, K-level factors are K - 1. So if out of your 1800 records, you have at least 500 to 1000 events, depending upon how many of your 50 variables are K-level factors and whether or not you need to consider interactions, you may be OK. Better if towards the high end of that range, especially if the model is for prediction versus explanation. Two excellent references would be Frank's book: http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/ and Steyerberg's book: http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/ to assist in providing guidance for model building/validation techniques. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ?
Hi, See ?factor e.g.: DATA$driver - factor(DATA$driver) See also the level= argument if you want to change the order of your levels. HTH, Ivan Le 6/14/2010 14:52, Andrea Bernasconi DG a écrit : Hi R help, Hi R help, Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA: DATA driver cars additive y 1 11A 19 2 21D 23 3 31B 15 4 41C 19 5 12B 24 6 22C 24 7 32D 14 8 42A 18 9 13D 23 10 23A 19 11 33C 15 12 43B 19 13 14C 26 14 24B 30 15 34A 16 16 44D 16 Now summary( aov(MODEL, data=DATA) ) Df Sum Sq Mean Sq F value Pr(F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8. 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 This results differ from book result at p 159, since cars and driver are treated as numerical variables by aov. BRUTE FORCE SOLUTION Manually transforming cars and driver into categorical variables, I obtain the correct result: DATA_AB driver cars additive y 1 D1 C1A 19 2 D2 C1D 23 3 D3 C1B 15 4 D4 C1C 19 5 D1 C2B 24 6 D2 C2C 24 7 D3 C2D 14 8 D4 C2A 18 9 D1 C3D 23 10 D2 C3A 19 11 D3 C3C 15 12 D4 C3B 19 13 D1 C4C 26 14 D2 C4B 30 15 D3 C4A 16 16 D4 C4D 16 summary( aov(MODEL, data=DATA_AB) ) Df Sum Sq Mean Sq F value Pr(F) cars 3 24 8.000 1.5 0.307174 driver 3216 72.00013.5 0.004466 ** additive 3 40 13.333 2.5 0.156490 Residuals6 32 5.333 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 QUESTION Which is the easiest (most elegant) way to force driver and cars from DATA to be treated as categorical variables by aov? More generally, which is the easiest way to force aov to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA: DATA driver cars additive y 1 11A 19 2 21D 23 3 31B 15 4 41C 19 5 12B 24 6 22C 24 7 32D 14 8 42A 18 9 13D 23 10 23A 19 11 33C 15 12 43B 19 13 14C 26 14 24B 30 15 34A 16 16 44D 16 Now summary( aov(MODEL, data=DATA) ) Df Sum Sq Mean Sq F value Pr(F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8. 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 This results differ from book result at p 159, since cars and driver are treated as numerical variables by aov. BRUTE FORCE SOLUTION Manually transforming cars and driver into categorical variables, I obtain the correct result: DATA_AB driver cars additive y 1 D1 C1A 19 2 D2 C1D 23 3 D3 C1B 15 4 D4 C1C 19 5 D1 C2B 24 6 D2 C2C 24 7 D3 C2D 14 8 D4 C2A 18 9 D1 C3D 23 10 D2 C3A 19 11 D3 C3C 15 12 D4 C3B 19 13 D1 C4C 26 14 D2 C4B 30 15 D3 C4A 16 16 D4 C4D 16 summary( aov(MODEL, data=DATA_AB) ) Df Sum
Re: [R] Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ?
I think I found the solution ! cc-factor(cars) dd-factor(driver) MODEL-y~cc+dd+additive summary(aov(MODEL,data=DATA)) On 14 Jun, 2010, at 2:52 PM, Andrea Bernasconi DG wrote: Hi R help, Hi R help, Which is the easiest (most elegant) way to force aov to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA: DATA driver cars additive y 1 11A 19 2 21D 23 3 31B 15 4 41C 19 5 12B 24 6 22C 24 7 32D 14 8 42A 18 9 13D 23 10 23A 19 11 33C 15 12 43B 19 13 14C 26 14 24B 30 15 34A 16 16 44D 16 Now summary( aov(MODEL, data=DATA) ) Df Sum Sq Mean Sq F value Pr(F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8. 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 This results differ from book result at p 159, since cars and driver are treated as numerical variables by aov. BRUTE FORCE SOLUTION Manually transforming cars and driver into categorical variables, I obtain the correct result: DATA_AB driver cars additive y 1 D1 C1A 19 2 D2 C1D 23 3 D3 C1B 15 4 D4 C1C 19 5 D1 C2B 24 6 D2 C2C 24 7 D3 C2D 14 8 D4 C2A 18 9 D1 C3D 23 10 D2 C3A 19 11 D3 C3C 15 12 D4 C3B 19 13 D1 C4C 26 14 D2 C4B 30 15 D3 C4A 16 16 D4 C4D 16 summary( aov(MODEL, data=DATA_AB) ) Df Sum Sq Mean Sq F value Pr(F) cars 3 24 8.000 1.5 0.307174 driver 3216 72.00013.5 0.004466 ** additive 3 40 13.333 2.5 0.156490 Residuals6 32 5.333 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 QUESTION Which is the easiest (most elegant) way to force driver and cars from DATA to be treated as categorical variables by aov? More generally, which is the easiest way to force aov to treat numerical variables as categorical ? Sincerely, Andrea Bernasconi DG PROBLEM EXAMPLE I consider the latin squares example described at page 157 of the book: Statistics for Experimenters: Design, Innovation, and Discovery by George E. P. Box, J. Stuart Hunter, William G. Hunter. This example use the data-file /BHH2-Data/tab0408.dat from ftp://ftp.wiley.com/ in /sci_tech_med/statistics_experimenters/BHH2-Data.zip. The file tab0408.dat contains following DATA: DATA driver cars additive y 1 11A 19 2 21D 23 3 31B 15 4 41C 19 5 12B 24 6 22C 24 7 32D 14 8 42A 18 9 13D 23 10 23A 19 11 33C 15 12 43B 19 13 14C 26 14 24B 30 15 34A 16 16 44D 16 Now summary( aov(MODEL, data=DATA) ) Df Sum Sq Mean Sq F value Pr(F) cars 1 12.8 12.800 0.8889 0.3680 driver 1 115.2 115.200 8. 0.0179 * additive 3 40.0 13.333 0.9259 0.4634 Residuals 10 144.0 14.400 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 This results differ from book result at p 159, since cars and driver are treated as numerical variables by aov. BRUTE FORCE SOLUTION Manually transforming cars and driver into categorical variables, I obtain the correct result: DATA_AB driver cars additive y 1 D1 C1A 19 2 D2 C1D 23 3 D3 C1B 15 4 D4 C1C 19 5 D1 C2B 24 6 D2 C2C 24 7 D3 C2D 14 8 D4 C2A 18 9 D1 C3D 23 10 D2 C3A 19 11 D3 C3C 15 12 D4 C3B 19 13 D1 C4C 26 14 D2 C4B 30 15 D3 C4A 16 16 D4 C4D 16 summary( aov(MODEL, data=DATA_AB) ) Df Sum Sq Mean Sq F
[R] merging data frames
Hi, is it possible to merge two data frames while preserving the row names of the bigger data frame? I have two data frames which i would like to combine. While doing so I always loose the row names. When I try to append this, I get the error message, that I have non-unique names. This although I used unique command on the data frame where the double inputs supposedly are thanks for the help Assa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to change default help settings from factory default html
On 14.06.2010 14:11, Katya Mauff wrote: Hi all Apologies if this is a trivial question- I have searched the lists and the online help files etc but have not managed to find anything. I recently downloaded the latest version of R, which has the help type set to htmlhelp as default (according to http://127.0.0.1:18380/library/utils/html/help.html) I would very much like to be able to access the help files when I am offline by typing ?topic etc as I used to with the previous R version. Any suggestions? Yes, just type ?topic and your web brwoser opens and displays the help file. Note that the address beginning http://127.0.0.1; is on your local machine, hence you can stay offline. Uwe Ligges Thanks Katya ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity. ### __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] list matching
One thing you might do is to transform the data into a format that is easier to combine; I like using 'merge': mynames=cbind(c('a','b'),c(11,22)) lst=list(a=c(1,2), b=5) mynames [,1] [,2] [1,] a 11 [2,] b 22 lst $a [1] 1 2 $b [1] 5 mynames.df - as.data.frame(mynames) mynames.df V1 V2 1 a 11 2 b 22 lst.s - stack(lst) lst.s values ind 1 1 a 2 2 a 3 5 b merge(mynames.df, lst.s, by.x=V1, by.y=ind) V1 V2 values 1 a 11 1 2 a 11 2 3 b 22 5 On Mon, Jun 14, 2010 at 8:06 AM, Yuan Jian jayuan2...@yahoo.com wrote: Hello, I could not find a clear solution for the follow question. please allow me to ask. thanks mynames=cbind(c('a','b'),c(11,22)) lst=list(a=c(1,2), b=5) now I try to combine mynames and lst: a 1 11 a 2 11 b 5 22 thanks jian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging data frames
Put the rownames as another column in your dataframe so that it remains with the data. After merging, you can then use it as the rownames On Mon, Jun 14, 2010 at 9:25 AM, Assa Yeroslaviz fry...@gmail.com wrote: Hi, is it possible to merge two data frames while preserving the row names of the bigger data frame? I have two data frames which i would like to combine. While doing so I always loose the row names. When I try to append this, I get the error message, that I have non-unique names. This although I used unique command on the data frame where the double inputs supposedly are thanks for the help Assa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to change default help settings from factory default html
On Mon, 14 Jun 2010, Katya Mauff wrote: Hi all Apologies if this is a trivial question- I have searched the lists and the online help files etc but have not managed to find anything. I recently downloaded the latest version of R, which has the help type set to htmlhelp as default (according to http://127.0.0.1:18380/library/utils/html/help.html) Not so for the latest R (2.11.1) compiled with the default settings. The help(help) page in 2.11.1 actually says The ‘factory-fresh’ default is text help except from the Mac OS GUI, which uses HTML help displayed in its own browser window. However, the Windows installer allows you to set the default type of help, and its default is HTML help in R = 2.10.0. I would very much like to be able to access the help files when I am offline by typing ?topic etc as I used to with the previous R version. Any suggestions? You should be able to use HTML help offline: it is running locally on your machine (127.0.0.1 is the loopback interface). (It is conceivable that your unstated OS disables the loopback interface when 'offline', but we have not come across an instance of this.) But ?help tells you how select other forms of help by default via help_type = getOption(help_type)) and ?'?' says This is a shortcut to ‘help’ and uses its default type of help. Thanks Katya ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity. ### __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression with 50 varaibales
Hi, Marcs explanation is valid to a certain extent, but I don't agree with his conclusion. I'd like to point out the curse of dimensionality(Hughes effect) which starts to play rather quickly. The curse of dimensionality is easily demonstrated looking at the proximity between your datapoints. Say we scale the interval in one dimension to be 1 unit. If you have 20 evenly-spaced observations, the distance between the observations is 0.05 units. To have a proximity like that in a 2-dimensional space, you need 20^2=400 observations. in a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The distance between your observations is important, as a sparse dataset will definitely make your model misbehave. Even with about 35 samples per variable, using 50 independent variables will render a highly unstable model, as your dataspace is about as sparse as it can get. On top of that, interpreting a model with 50 variables is close to impossible, and then I didn't even start on interactions. No point in trying I'd say. If you really need all that information, you might want to take a look at some dimension reduction methods first. Cheers Joris On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz marc_schwa...@me.com wrote: On Jun 13, 2010, at 10:20 PM, array chip wrote: Hi, this is not R technical question per se. I know there are many excellent statisticians in this list, so here my questions: I have dataset with ~1800 observations and 50 independent variables, so there are about 35 samples per variable. Is it wise to build a stable multiple logistic model with 50 independent variables? Any problem with this approach? Thanks John The general rule of thumb is to have 10-20 'events' per covariate degree of freedom. Frank has suggested that in some cases that number should be as high as 25. The number of events is the smaller of the two possible outcomes for your binary dependent variable. Covariate degrees of freedom refers to the number of columns in the model matrix. Continuous variables are 1, binary factors are 1, K-level factors are K - 1. So if out of your 1800 records, you have at least 500 to 1000 events, depending upon how many of your 50 variables are K-level factors and whether or not you need to consider interactions, you may be OK. Better if towards the high end of that range, especially if the model is for prediction versus explanation. Two excellent references would be Frank's book: http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/ and Steyerberg's book: http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/ to assist in providing guidance for model building/validation techniques. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to change default help settings from factory default html
Hi-I have tried that offline, -my browser opens and says: Offline mode Firefox is currently in offline mode and can't browse the Web. Uncheck Work Offline in the File menu, then try again I can access ?help that way, or pages I've been to before having gone offline, but nothing new. Uwe Ligges lig...@statistik.tu-dortmund.de 2010/06/14 03:29 PM On 14.06.2010 14:11, Katya Mauff wrote: Hi all Apologies if this is a trivial question- I have searched the lists and the online help files etc but have not managed to find anything. I recently downloaded the latest version of R, which has the help type set to htmlhelp as default (according to http://127.0.0.1:18380/library/utils/html/help.html) I would very much like to be able to access the help files when I am offline by typing ?topic etc as I used to with the previous R version. Any suggestions? Yes, just type ?topic and your web brwoser opens and displays the help file. Note that the address beginning http://127.0.0.1; is on your local machine, hence you can stay offline. Uwe Ligges Thanks Katya ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity. ### __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity. ### [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Html help
If the IP number is something like 127.0.0.1:x then you are on your local computer. Cheers Joris On Mon, Jun 14, 2010 at 1:33 PM, Murray Jorgensen m...@waikato.ac.nz wrote: I have just installed R 2.11.1 on my XP laptop. I like html help for browsing but text help for on-the-fly look-ups. I was a bit surprised when I was asked to choose between them during the installation. I chose text, thinking I could fix the html help later, which is what I am trying to do now. Now when I ask for html help my browser goes to 'http://-ip-number-/doc/html/index.html' instead of where I want on my computer: C:\apps\R\R-2.11.1\doc\html\index.html Now I can go where I want manually but then the package list on C:\apps\R\R-2.11.1\doc\html\packages.html does not include all the packages that I have installed and linked. I don't want to read my html help from the web because sometimes I am off-line or on a slow connection. How do I go about getting a local set of html help files? Cheers, Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: m...@waikato.ac.nz majorgen...@ihug.co.nz Fax 7 838 4155 Phone +64 7 838 4773 wk Home +64 7 825 0441 Mobile 021 0200 8350 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] remove last char of a text string
Dear R experts, is there a simple way to remove the last char of a text string? substr() function use as parameter start end only... but my strings are of different length... 01asap05a - 01asap05 02ee04b - 02ee04 Thank you all, Gianandrea -- View this message in context: http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove last char of a text string
On Mon, Jun 14, 2010 at 3:47 PM, glaporta glapo...@freeweb.org wrote: Dear R experts, is there a simple way to remove the last char of a text string? substr() function use as parameter start end only... but my strings are of different length... 01asap05a - 01asap05 02ee04b - 02ee04 Thank you all, Gianandrea -- View this message in context: http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html Sent from the R help mailing list archive at Nabble.com. It's not terribly elegant, but this works: orig.text-c(01asap05a,02ee04b) substr(orig.text,1,nchar(orig.text)-1) Regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove last char of a text string
Sure. You can use nchar() to find out how long the string is. teststring - 01asap05a substr(teststring, 1, nchar(teststring)-1) [1] 01asap05 On Mon, Jun 14, 2010 at 9:47 AM, glaporta glapo...@freeweb.org wrote: Dear R experts, is there a simple way to remove the last char of a text string? substr() function use as parameter start end only... but my strings are of different length... 01asap05a - 01asap05 02ee04b - 02ee04 Thank you all, Gianandrea -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to change default help settings from factory default html
On 14.06.2010 15:39, Katya Mauff wrote: Hi-I have tried that offline, -my browser opens and says: Offline mode Firefox is currently in offline mode and can't browse the Web. Uncheck Work Offline in the File menu, then try again Well, go online with Firefox which means firefox can access the R help pages afterwards. You do not need to go really online with your machine. Uwe I can access ?help that way, or pages I've been to before having gone offline, but nothing new. Uwe Liggeslig...@statistik.tu-dortmund.de 2010/06/14 03:29 PM On 14.06.2010 14:11, Katya Mauff wrote: Hi all Apologies if this is a trivial question- I have searched the lists and the online help files etc but have not managed to find anything. I recently downloaded the latest version of R, which has the help type set to htmlhelp as default (according to http://127.0.0.1:18380/library/utils/html/help.html) I would very much like to be able to access the help files when I am offline by typing ?topic etc as I used to with the previous R version. Any suggestions? Yes, just type ?topic and your web brwoser opens and displays the help file. Note that the address beginning http://127.0.0.1; is on your local machine, hence you can stay offline. Uwe Ligges Thanks Katya ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity. ### __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mail disclaimer published on our website at http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 21 650 4500. This e-mail is intended only for the person(s) to whom it is addressed. If the e-mail has reached you in error, please notify the author. If you are not the intended recipient of the e-mail you may not use, disclose, copy, redirect or print the content. If this e-mail is not related to the business of UCT it is sent by the sender in the sender's individual capacity. ### __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove last char of a text string
Try: gsub(.$, , c('01asap05a', '02ee04b')) On Mon, Jun 14, 2010 at 10:47 AM, glaporta glapo...@freeweb.org wrote: Dear R experts, is there a simple way to remove the last char of a text string? substr() function use as parameter start end only... but my strings are of different length... 01asap05a - 01asap05 02ee04b - 02ee04 Thank you all, Gianandrea -- View this message in context: http://r.789695.n4.nabble.com/remove-last-char-of-a-text-string-tp2254377p2254377.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Prime Numbers Pkgs - Schoolmath is broken
Looking for a recommended package that handles prime number computations. Tried the following unsuccessfully: primeFactors() in the R.basic package failed to install. primes() and primlist are broken in Schoolmath pkg on CRAN. My analysis can be found here http://j.mp/9BNI9q Not sure what the procedure is for getting things fixed, so I've cross-posted to r-dev as well. --njg TAKING THE PITH OUT OF PERFORMANCE http://perfdynamics.blogspot.com/ Follow me on Twitter http://twitter.com/DrQz PERFORMANCE DYNAMICS COMPANY http://www.perfdynamics.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cforest and Random Forest memory use
The first thing that I would recommend is to avoid the formula interface to models. The internals that R uses to create matrices form a formula+data set are not efficient. If you had a large number of variables, I would have automatically pointed to that as a source of issues. cforest and ctree only have formula interfaces though, so you are stuck on that one. The randomForest package has both interfaces, so that might be better. Probably the issue is the depth of the trees. With that many observations, you are likely to get extremely deep trees. You might try limiting the depth of the tree and see if that has an effect on performance. We run into these issues with large compound libraries; in those cases we do whatever we can to avoid ensembles of trees or kernel methods. If you want those, you might need to write your own code that is hyper-efficient and tuned to your particular data structure (as we did). On another note... are this many observations really needed? You have 40ish variables; I suspect that 1M points are pretty densely packed into 40-dimensional space. Do you loose much by sampling the data set or allocating a large portion to a test set? If you have thousands of predictors, I could see the need for so many observations, but I'm wondering if many of the samples are redundant. Max On Mon, Jun 14, 2010 at 3:45 AM, Matthew OKane mlok...@gmail.com wrote: Answers added below. Thanks again, Matt On 11 June 2010 14:28, Max Kuhn mxk...@gmail.com wrote: Also, you have not said: - your OS: Windows Server 2003 64-bit - your version of R: 2.11.1 64-bit - your version of party: 0.9-9995 - your code: test.cf -(formula=badflag~.,data = example,control=cforest_control (teststat = 'max', testtype = 'Teststatistic', replace = FALSE, ntree = 500, savesplitstats = FALSE,mtry = 10)) - what Large data set means: 1 million observations, 40+ variables, around 200MB - what very large model objects means - anything which breaks So... how is anyone suppose to help you? Max -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: How to color the data points in splom() according to the panel they are plotted?
On Mon, Jun 14, 2010 at 6:24 PM, Martin Maechler maech...@stat.math.ethz.ch wrote: Dear Deepayan, this is in reply to a message almost 6 months ago : Deepayan Sarkar deepayan.sar...@gmail.com [...] Thanks, I was going to say the same thing, except that it would be (1) conceptually simpler just to add the 'i' and 'j' values as arguments to the panel function (the 'pargs' variable). The colors could then be passed through as part of the ... arguments, and the relevant entry extracted in the panel function. (2) The other option is to keep a global counter and increment it inside the panel function, and choosing colors based on that counter and knowledge of the order in which panels are drawn. Not very elegant, but the least intrusive solution I can think of. -Deepayan Against the R-forge version of lattice, the following very small patch to panel.pairs would allow users to use '(1)' i.e., provide panel functions with (i,j) arguments directly. I'm pretty sure that the change could not easily break existing splom() usage. --- R/splom.R (revision 619) +++ R/splom.R (working copy) @@ -291,7 +291,8 @@ y = z[subscripts, i]), ## panel.number = panel.number, ## packet.number = packet.number), - list(...)) + list(...), + list(i = i, j = j)) else c(list(x = z[subscripts, j], y = z[subscripts, i], @@ -299,7 +300,8 @@ subscripts = subscripts), ## panel.number = panel.number, ## packet.number = packet.number), - list(...)) + list(...), + list(i = i, j = j)) if (!(... %in% names(formals(panel pargs - pargs[intersect(names(pargs), names(formals(panel)))] Done in r-forge svn. -Deepayan With the above change, a user could use a panel function with (i,j) arguments, and e.g. say Cmat - outer(1:6,1:6, function(i,j) rainbow(11, start=.12, end=.5)[i+j-1]) splom(~diag(6), ## for testing: superpanel=mypanel.pairs, panel=function(x,y,i,j,...){ panel.fill(Cmat[i,j]); panel.splom(x,y,...) panel.text(.5,.5, paste((,i,,,j,),sep=)) }) I think that would allow quite a bit more flexibility without the need to explicitly hack panel.pairs (and having to maintain such a hack against the ever-enhancing lattice). Martin Maechler, ETH Zurich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression with 50 varaibales
Dear all, (this first part of the email I sent to John earlier today, but forgot to put it to the list as well) Dear John, Hi, this is not R technical question per se. I know there are many excellent statisticians in this list, so here my questions: I have dataset with ~1800 observations and 50 independent variables, so there are about 35 samples per variable. Is it wise to build a stable multiple logistic model with 50 independent variables? Any problem with this approach? Thanks First: I'm not a statistician, but a spectroscopist. But I do build logistic Regression models with far less than 1800 samples and far more variates (e.g. 75 patients / 256 spectral measurement channels). Though I have many measurements per sample: typically several hundred spectra per sample. Question: are the 1800 real, independent samples? Model stability is something you can measure. Do a honest validation of your model with really _independent_ test data and measure the stability according to what your stability needs are (e.g. stable parameters or stable predictions?). (From here on reply to Joris) Marcs explanation is valid to a certain extent, but I don't agree with his conclusion. I'd like to point out the curse of dimensionality(Hughes effect) which starts to play rather quickly. No doubt. The curse of dimensionality is easily demonstrated looking at the proximity between your datapoints. Say we scale the interval in one dimension to be 1 unit. If you have 20 evenly-spaced observations, the distance between the observations is 0.05 units. To have a proximity like that in a 2-dimensional space, you need 20^2=400 observations. in a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The distance between your observations is important, as a sparse dataset will definitely make your model misbehave. But won't also the distance between groups grow? No doubt, that high-dimensional spaces are _very_ unintuitive. However, the required sample size may grow substantially slower, if the model has appropriate restrictions. I remember the recommendation of at least 5 samples per class and variate for linear classification models. I.e. not to get a good model, but to have a reasonable chance of getting a stable model. Even with about 35 samples per variable, using 50 independent variables will render a highly unstable model, Am I wrong thinking that there may be a substantial difference between stability of predictions and stability of model parameters? BTW: if the models are unstable, there's also aggregation. At least for my spectra I can give toy examples with physical-chemical explanation that yield the same prediction with different parameters (of course because of correlation). as your dataspace is about as sparse as it can get. On top of that, interpreting a model with 50 variables is close to impossible, No, not necessary. IMHO it depends very much on the meaning of the variables. E.g. for the spectra a set of model parameters may be interpreted like spectra or difference spectra. Of course this has to do with the fact, that a parallel coordinate plot is the more natural view of spectra compared to a point in so many dimensions. and then I didn't even start on interactions. No point in trying I'd say. If you really need all that information, you might want to take a look at some dimension reduction methods first. Which puts to my mind a question I've had since long: I assume that all variables that I know beforehand to be without information are already discarded. The dimensionality is then further reduced in a data-driven way (e.g. by PCA or PLS). The model is built in the reduced space. How much less samples are actually needed, considering the fact that the dimension reduction is a model estimated on the data? ...which of course also means that the honest validation embraces the data-driven dimensionality reduction as well... Are there recommendations about that? The other curious question I have is: I assume that it is impossible for him to obtain the 10^xy samples required for comfortable model building. So what is he to do? Cheers, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 0 40 5 58-37 68 email: cbelei...@units.it __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to ignore rows missing arguments of a function when creating a function?
Ah, I overlooked that possibility. You can do following : not - attr(fm$model,na.action) if( ! is.null(not)){ # only drop the NA values if there are any left out of the model cluster - cluster[-not] dat - dat[-not,] } with(dat,{ On Mon, Jun 14, 2010 at 4:30 PM, edmund jones edmund.j.jo...@gmail.com wrote: Thanks a lot! So, what you propose works perfectly (the cluster variable is indeed a vector), except in the case where I have no missing values in my regression. With the following function: cl2 - function(dat,fm, cluster){ attach(dat, warn.conflicts = F) require(sandwich) require(lmtest) not - attr(fm$model,na.action) cluster - cluster[-not] with( dat[-not,] ,{ M - length(unique(cluster)) N - length(cluster) K - fm$rank dfc - (M/(M-1))*((N-1)/(N-K)) uj - apply(estfun(fm),2, function(x) tapply(x, cluster, sum)); vcovCL - dfc*sandwich(fm, meat=crossprod(uj)/N) coeftest(fm, vcovCL) } ) } If I have no missing values in the arguments of fm, get the message error in -not: invalid argument to unary operator In your example: x - rnorm(100) y - c(rnorm(90),NA,rnorm(9)) test - lm(x~y) str(test) List of 13 ... (tons of information) $ model :'data.frame': 99 obs. of 2 variables: ... (more tons of information) ..- attr(*, na.action)=Class 'omit' Named int 91 .. .. ..- attr(*, names)= chr 91 - attr(*, class)= chr lm Now we know that we can do : not -attr(test$model,na.action) y[-not] If I have, for example, y - rnorm(100), I also get the error: error in -not: invalid argument to unary operator In my database: female income transport lunch dist reg_f 4900 0 18.405990 0 0 75 750 4901 0 NA 0 NA 75 753 4902 1 NA 0 1 75 752 4903 1 NA 0 1 75 751 4904 1 69.678340 1 0 74 740 4905 0 57.953230 1 0 73 730 4906 1 85.835130 0 1 68 680 4907 0 81.952980 0 0 75 750 4908 1 46.837490 1 0 74 740 4909 0 NA 1 0 5 52 4910 1 65.041360 0 1 75 750 4911 0 77.451870 1 0 75 750 4912 0 96.148590 1 0 75 750 4913 0 64.510410 0 0 74 740 4914 0 69.391230 0 0 75 750 4915 0 4.804243 0 1 65 650 4916 0 NA 0 0 75 751 4917 1 NA 0 0 75 751 4918 1 NA 0 1 40 401 4919 1 49.920750 0 1 76 760 4920 0 NA 0 1 76 763 4921 0 10.187910 0 0 77 770 4922 0 14.839710 0 1 77 770 4923 1 32.041000 0 0 77 770 4924 0 85.639440 0 0 77 770 4925 1 86.308410 0 0 68 680 4926 0 79.223910 0 0 7 70 4927 0 81.825800 0 0 78 780 4928 0 31.931000 0 1 37 370 4929 0 53.282310 0 1 41 410 4930 1 31.312910 1 1 25 250 4931 1 50.478870 0 1 25 250 4932 0 NA 0 0 66 662 4933 1 58.156940 0 1 31 310 4934 0 NA 0 1 1 13 4935 1 NA 0 1 1 12 4936 1 59.149180 0 0 3 30 4937 1 5.400807 0 1 5 50 4938 1 76.828630 0 0 6 60 4939 1 73.488300 0 1 63 630 4940 0 6.529074 0 1 6 60 4941 0 NA 0 0 6 61 4942 1 70.128530 0 0 3 30 4943 0 NA 0 1 53 531 4944 1 75.715350 1 0 6 60 4945 0 8.623850 0 1 24 240 4946 1 79.062470 0 1 62 620 4947 0 83.863370 1 1 11 110 4948 0 58.904450 0 1 62 620 4949 0 88.500290 0 0 9 90 4950 0 NA 1 1 9 90 4951 NA NA 0 1 15 151 It works perfectly if I do cl2(testdata, lm(income ~ transport + reg_f ), female) but not for cl2(testdata, lm(dist ~ transport + reg_f ), female) Or any other case about when the arguments of the lm function have no missing values. How can I tell R to do this only if there is a missing problem? Thanks a lot for your help! Working on your previous replies has been very helpful in understanding how R works. Cheers, Edmund. 2010/6/13 Joris Meys jorism...@gmail.com Next to that, Off course you have to use the right indices for cluster, but as I have no clue what the dist is, I just put there something. So if it is a matrix, naturally you'll get errors. If I make a vector cluster with the same length as the
[R] using R to draw over a distribution.
Hi, Suppose I analyze a log to create a histogram: event E1 occurred N1 times event E2 occurred N2 times ... ... for m total events ... event Em occurred Nm times The total number of occurrences is: T = SumNj j=1..m I want to give this histogram to R and ask it to produce T random events such that approximately N1 instances of E1 are drawn, N2 instances of E2 drawn, and so forth. Regards, Shane __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using R to draw over a distribution.
On Jun 14, 2010, at 10:42 AM, SHANE MILLER, BLOOMBERG/ 731 LEXIN wrote: Hi, Suppose I analyze a log to create a histogram: event E1 occurred N1 times event E2 occurred N2 times ... ... for m total events ... event Em occurred Nm times The total number of occurrences is: T = SumNj j=1..m I want to give this histogram to R and ask it to produce T random events such that approximately N1 instances of E1 are drawn, N2 instances of E2 drawn, and so forth. ?table # or perhaps use the fact that hist() will return a table of a particular type. ?sample # from the events with prob = frequencies/T (If you insist on constraining the total count to be the observed sum it will no longer be random with m degrees of freedom. And there are many postings on how to do this in the archives.) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression with 50 varaibales
I think the real issue is why the fit is being done. If it is solely to interpolate and condense the dataset, the number of variables is not an important issue. If the issue is developing a model that will capture causality, it is hard to believe that can be accomplished with 50+ variables. With this many, some kind of hunt would have to be done, and the resulting model would not be real stable. It would be better perhaps to first reduce the variable set by, say, principal components analysis, so that a reasonable sized set results. If a stable and meaningful model is the goal, each term in the final model should be plausibly causal. At 10:36 AM 6/14/2010, Claudia Beleites wrote: Dear all, (this first part of the email I sent to John earlier today, but forgot to put it to the list as well) Dear John, Hi, this is not R technical question per se. I know there are many excellent statisticians in this list, so here my questions: I have dataset with ~1800 observations and 50 independent variables, so there are about 35 samples per variable. Is it wise to build a stable multiple logistic model with 50 independent variables? Any problem with this approach? Thanks First: I'm not a statistician, but a spectroscopist. But I do build logistic Regression models with far less than 1800 samples and far more variates (e.g. 75 patients / 256 spectral measurement channels). Though I have many measurements per sample: typically several hundred spectra per sample. Question: are the 1800 real, independent samples? Model stability is something you can measure. Do a honest validation of your model with really _independent_ test data and measure the stability according to what your stability needs are (e.g. stable parameters or stable predictions?). (From here on reply to Joris) Marcs explanation is valid to a certain extent, but I don't agree with his conclusion. I'd like to point out the curse of dimensionality(Hughes effect) which starts to play rather quickly. No doubt. The curse of dimensionality is easily demonstrated looking at the proximity between your datapoints. Say we scale the interval in one dimension to be 1 unit. If you have 20 evenly-spaced observations, the distance between the observations is 0.05 units. To have a proximity like that in a 2-dimensional space, you need 20^2=400 observations. in a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The distance between your observations is important, as a sparse dataset will definitely make your model misbehave. But won't also the distance between groups grow? No doubt, that high-dimensional spaces are _very_ unintuitive. However, the required sample size may grow substantially slower, if the model has appropriate restrictions. I remember the recommendation of at least 5 samples per class and variate for linear classification models. I.e. not to get a good model, but to have a reasonable chance of getting a stable model. Even with about 35 samples per variable, using 50 independent variables will render a highly unstable model, Am I wrong thinking that there may be a substantial difference between stability of predictions and stability of model parameters? BTW: if the models are unstable, there's also aggregation. At least for my spectra I can give toy examples with physical-chemical explanation that yield the same prediction with different parameters (of course because of correlation). as your dataspace is about as sparse as it can get. On top of that, interpreting a model with 50 variables is close to impossible, No, not necessary. IMHO it depends very much on the meaning of the variables. E.g. for the spectra a set of model parameters may be interpreted like spectra or difference spectra. Of course this has to do with the fact, that a parallel coordinate plot is the more natural view of spectra compared to a point in so many dimensions. and then I didn't even start on interactions. No point in trying I'd say. If you really need all that information, you might want to take a look at some dimension reduction methods first. Which puts to my mind a question I've had since long: I assume that all variables that I know beforehand to be without information are already discarded. The dimensionality is then further reduced in a data-driven way (e.g. by PCA or PLS). The model is built in the reduced space. How much less samples are actually needed, considering the fact that the dimension reduction is a model estimated on the data? ...which of course also means that the honest validation embraces the data-driven dimensionality reduction as well... Are there recommendations about that? The other curious question I have is: I assume that it is impossible for him to obtain the 10^xy samples required for comfortable model building. So what is he to do? Cheers, Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università
Re: [R] Prime Numbers Pkgs - Schoolmath is broken
On Mon, 2010-06-14 at 07:16 -0700, Red Roo wrote: Looking for a recommended package that handles prime number computations. Tried the following unsuccessfully: primeFactors() in the R.basic package failed to install. primes() and primlist are broken in Schoolmath pkg on CRAN. My analysis can be found here http://j.mp/9BNI9q Not sure what the procedure is for getting things fixed, so I've cross-posted to r-dev as well. Neither place is correct. This has *nothing* to do with R. Address bug reports directly to the package maintainer, details of which can be found here: http://cran.r-project.org/web/packages/schoolmath/index.html which is what is requested in the posting guide... HTH G --njg TAKING THE PITH OUT OF PERFORMANCE http://perfdynamics.blogspot.com/ Follow me on Twitter http://twitter.com/DrQz PERFORMANCE DYNAMICS COMPANY http://www.perfdynamics.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression with 50 varaibales
Joris, There are two separate issues here: 1. Can you consider an LR model with 50 covariates? 2. Should you have 50 covariates in your LR model? The answer to 1 is certainly yes, given what I noted below as a general working framework. I have personally been involved with the development and validation of LR models with ~35 covariates, albeit with notably larger datasets than discussed below, because the models are used for prediction. In fact, the current incarnations of those same models, now 15 years later, appear to have 40 covariates and are quite stable. The interpretation of the models by both statisticians and clinicians is relatively straightforward. The answer to 2 gets into the subject matter that you raise, which is to consider other factors beyond the initial rules of thumb for minimum sample size. These get into reasonable data reduction methods, the consideration of collinearity, subject matter expertise, sparse data, etc. The issues raised in number 2 are discussed in the two references that I noted. Two additional references that might be helpful here on the first point are: P. Peduzzi, J. Concato, E. Kemper, T. R. Holford, and A. R. Feinstein. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epi, 49:1373–1379, 1996. E. Vittinghoff and C. E. McCulloch. Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epi, 165:710–718, 2006. Regards, Marc On Jun 14, 2010, at 8:38 AM, Joris Meys wrote: Hi, Marcs explanation is valid to a certain extent, but I don't agree with his conclusion. I'd like to point out the curse of dimensionality(Hughes effect) which starts to play rather quickly. The curse of dimensionality is easily demonstrated looking at the proximity between your datapoints. Say we scale the interval in one dimension to be 1 unit. If you have 20 evenly-spaced observations, the distance between the observations is 0.05 units. To have a proximity like that in a 2-dimensional space, you need 20^2=400 observations. in a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The distance between your observations is important, as a sparse dataset will definitely make your model misbehave. Even with about 35 samples per variable, using 50 independent variables will render a highly unstable model, as your dataspace is about as sparse as it can get. On top of that, interpreting a model with 50 variables is close to impossible, and then I didn't even start on interactions. No point in trying I'd say. If you really need all that information, you might want to take a look at some dimension reduction methods first. Cheers Joris On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz marc_schwa...@me.com wrote: On Jun 13, 2010, at 10:20 PM, array chip wrote: Hi, this is not R technical question per se. I know there are many excellent statisticians in this list, so here my questions: I have dataset with ~1800 observations and 50 independent variables, so there are about 35 samples per variable. Is it wise to build a stable multiple logistic model with 50 independent variables? Any problem with this approach? Thanks John The general rule of thumb is to have 10-20 'events' per covariate degree of freedom. Frank has suggested that in some cases that number should be as high as 25. The number of events is the smaller of the two possible outcomes for your binary dependent variable. Covariate degrees of freedom refers to the number of columns in the model matrix. Continuous variables are 1, binary factors are 1, K-level factors are K - 1. So if out of your 1800 records, you have at least 500 to 1000 events, depending upon how many of your 50 variables are K-level factors and whether or not you need to consider interactions, you may be OK. Better if towards the high end of that range, especially if the model is for prediction versus explanation. Two excellent references would be Frank's book: http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/ and Steyerberg's book: http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/ to assist in providing guidance for model building/validation techniques. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] do faster ANOVAS
Thank you, with the matrix for the responses (here my 101 timepoints), it takes less than 30 minutes for 1000 pemutations, whereas before it takes 2h30! Best regards, Mélissa Message du 10/06/10 18:52 De : Douglas Bates A : melissa Copie à : r-help@r-project.org Objet : Re: [R] do faster ANOVAS The lm and aov functions can take a matrix response allowing you to fit all of the responses for a single attribute simultaneously. On Thu, Jun 10, 2010 at 8:47 AM, melissa wrote: Dear all R users, I want to realize 800 000 ANOVAS and to store Sum of Squares of the effects. Here is an extract of my table data Product attribute subject rep t1 t2 t3 ⦠t101 P1 A1 S1 R1 1 0 0 ⦠1 I want to realize 1 ANOVA per timepoint and per attribute, there are 101 timepoints and 8 attributes so I want to realize 808 ANOVAS. This will be an ANOVA with two factors : Here is one example: Aov(t1~Subject*Product,data[data$attribute==âA1â,]) I want to store for each ANOVA SSprod,SSsujet,SSerreur,SSinter and SStotal. In fact I want the result in several matrices: Ssprod matrice: T1 t2 t3 t4 ⦠t101 A1 ssprod(A1,T1) A2 A3 ⦠A8 So I would like a matrice like that for ssprod, ssujet,sserreur,ssinter and sstotal. And this is for one permutation, and I want to do 1000 permutations Here is my code: SSmatrixglobal-function(k){ daten.temp-data daten.temp$product=permutations[[k]] listmat-apply(daten.temp[,5:105],2,function(x,y){ tab2-as.data.frame(cbind(x,y)) tab.class-by(tab2[,1:3],tab2[,4],function(x){ f - formula(paste(names(x)[1],~,names(x)[2],*,names(x)[3],sep=)) anovas - aov(f, data=x) anovas$call$formula -f s1 - summary(anovas) qa - s1[[1]][,2] return(qa) }) return(tab.class) },y=daten.temp[,1:3] ) ar - array(unlist(listmat),dim=c(length(listmat[[1]][[1]]),length(listmat[[1]]),length(listmat))) l=lapply(1:4,function(i) ar[i,,]) sssujet=l[[1]] ssprod=l[[2]] ssinter=l[[3]] sserreur=l[[4]] ss=rbind(sssujet,ssprod,ssinter,sserreur,sstotal) ss=as.data.frame(ss) sqlSave(channel,ss,SS1000,append=T) rm(ss,numperm,daten.temp) } system.time(por - lapply(c(1:1000), SSmatrixglobal)) But it takes time about 90seconds for a permutation so *1000, how can I do in order to do faster ANOVAS? Many thanks Best regards Mélissa PS: I think that I can gain a lot of time in the aov function but I don't know how to do [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xtable with Sweave
Hi, I'm using Sweave to prepare a descriptive report. Are at least 20 tables built with xtable command of kind: echo=F, results=hide= q5 = factor(Q5, label=c(Não, Sim)) (q5.tab = cbind(table(q5))) @ echo=F, results=tex= xtable(q5.tab, align=l|c, caption.placement = top, table.placement='H') @ I'm getting the following message: Too many unprocessed floats in Latex file. How to avoid these messages appearing? -- Silvano Cesar da Costa Departamento de Estatística Universidade Estadual de Londrina Fone: 3371-4346 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lme command
Hi, I am doing a longitudinal data set fit using lme. I used two forms of the lme command and I am getting two different outputs. FIRST out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status, random=(list(ident=~Idade+Idade2))) SECOND out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status, random= ~Idade+Idade2|ident,data=dados) I got weird results from the first one and could not understand the reason of it. All the results are exactly the same but the intercetp, and the two main terms sexo (gender) and status (treatment). That differences made a lot of difference in the final results. Anybody can tell me what is the differences between them? Thanks. Enrico. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in Linux: problem with special characters
Hi, First of all, thank you for you reply. It was very helpfull. I have another problem: I have changed the locale to pt_pt.iso885...@euro. Now the problem that I reported earlier doesnt occur . print(dúvida) [1] dúvida My system information now is the following: Sys.getlocale() [1] lc_ctype=pt_pt.iso885...@euro;LC_NUMERIC=C;lc_time=pt_pt.iso885...@euro;lc_collate=pt_pt.iso885...@euro; LC_MONETARY=C;lc_messages=pt_pt.iso885...@euro;lc_paper=pt_pt.iso885...@euro;LC_NAME=C;LC_ADDRESS=C; LC_TELEPHONE=C;lc_measurement=pt_pt.iso885...@euro;LC_IDENTIFICATION=C [daniel.fernan...@pt-lnx13 ~]$ locale lang=pt_pt.iso885...@euro lc_ctype=pt_pt.iso885...@euro lc_numeric=pt_pt.iso885...@euro lc_time=pt_pt.iso885...@euro lc_collate=pt_pt.iso885...@euro lc_monetary=pt_pt.iso885...@euro lc_messages=pt_pt.iso885...@euro lc_paper=pt_pt.iso885...@euro lc_name=pt_pt.iso885...@euro lc_address=pt_pt.iso885...@euro lc_telephone=pt_pt.iso885...@euro lc_measurement=pt_pt.iso885...@euro lc_identification=pt_pt.iso885...@euro LC_ALL= However, if I utilise the tcltk package and try to build a frame with the label dúvida the string appears incorrect. My guess is that this problem is also related with the locales. Does Tcl/tk utilise a different locale? Thanks in advance, Daniel Date: Fri, 11 Jun 2010 17:54:34 -0400 From: murdoch.dun...@gmail.com To: danielpas...@hotmail.com CC: r-help@r-project.org Subject: Re: [R] R in Linux: problem with special characters daniel fernandes wrote: Hi, Im working with the 64 bit version of R 2.11.0 for Linux. My session info is: R version 2.11.0 (2010-04-22) x86_64-redhat-linux-gnu locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base When I try to print words with special characters the result is that the expression printed has some kind of code substituting the special character. For example, if I run print(dúvida) the result is: print(dúvida) [1] d\372vida This as problem has something to do with the locale settings? If I run the locale command in the Linux server, I get: Yes, it's your locale settings. The C locale doesn't support the ú character in your string, and displays it in octal. Duncan Murdoch [daniel.fernan...@pt-lnx13 ~]$ locale LANG=pt_PT.UTF-8 LC_CTYPE=C LC_NUMERIC=C LC_TIME=C LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C LC_ALL=C Thanks in advance for your help, Daniel TRANSFORME SUAS FOTOS EM EMOTICONS PARA O MESSENGER. CLIQUE AQUI E VEJA COMO. _ VEJA SEUS EMAILS ONDE QUER QUE VOCÊ ESTEJA, ACESSE O HOTMAIL PELO SEU CELULAR AGORA. =Live_Hotmailutm_medium=Taglineutm_content=VEJASEUSEM84utm_campaign=MobileServices [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. _ ACESSE O MESSENGER DO SEU CELULAR AGORA MESMO. CLIQUE E VEJA AQUI UM PASSO A PASSO. rce=Live_Hotmailutm_medium=Taglineutm_content=ACESSEOMES83utm_campaign=MobileServices [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] MICE Package and LMER
Hi R users, I am estimating a multilevel model using lmer. My dataset has missing values and I am using MICE package to make Multiple Imputations. Everything works good until i reach the POOLING stage using the pool() function. I am able to get a summary of the pooled fixed effects but not the random effects. No errors or warnings are given to me. I checked the help file in R and the developers of MICE noted that The function pools also estimates obtained with lme() and lmer(), BUT only the fixed part of the model. Does anyone have any ideas on how I can get a summary of pooled random effects? Below is my code imp-mice(mydata,m=3, imputationMethod =c(,,,logreg),maxit=2, pri=F) model - with(data=imp, lmer(miss~ sex + age + (1|id) + (1|sch), family=binomial(link=logit))) result-pool(model) summary(result) Thanks Trevor -- View this message in context: http://r.789695.n4.nabble.com/MICE-Package-and-LMER-tp2254504p2254504.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] recursive function
Dear list, I have the following problem, what i'm trying to do is to built a function which does the following calculationg in a recursive way: I have a data frame more or less like this: variableyear DELTA EC01 2006/ EC01 2007 10 EC01 20085 EC01 20099 And then I have at time 2009 a variable called R_EC01(2009)=5 What I have to do is to construct the R_EC01 time series by starting from the 2009 value: R_EC01(2008)=R_EC01(2009)-DELTA(2009) R_EC01(2007)=R_EC01(2008)-DELTA(2008) R_EC01(2006)=R_EC01(2007)-DELTA(2007) In terms of number, the results that i should get are: R_EC01(2008)=5-9=-4 R_EC01(2007)=-4-5=-9 R_EC01(2006)=-9-10=-19 so my data frame should looks like this SERIESYEAR value R_EC01 2006 -19 R_EC012007 -9 R_EC012008 -4 R_EC01 2009 5 Anyone Knows hot to do it?? My dataframe is not set as a time series... Thanks a lot!!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lme command
Hello Enrico, One thing I notice between your two calls is that in the second you specify data=dados, but you do not in the first. When I try to do something similar to your formulae using one of my longitudinal datasets, I get the same results whether or not I put the formula for random in a list. Perhaps you could provide some sample data that shows what is happening? One other comment, rather than specifically adding each term and interaction, you can use this shorthand: Altura~(Idade+Idade2)*(sexo+status) Best regards, Josh On Mon, Jun 14, 2010 at 6:37 AM, Enrico Colosimo enrico...@gmail.com wrote: Hi, I am doing a longitudinal data set fit using lme. I used two forms of the lme command and I am getting two different outputs. FIRST out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status, random=(list(ident=~Idade+Idade2))) SECOND out-lme(Altura~Idade+Idade2+sexo+status+Idade:sexo+Idade:status+Idade2:sexo+Idade2:status, random= ~Idade+Idade2|ident,data=dados) I got weird results from the first one and could not understand the reason of it. All the results are exactly the same but the intercetp, and the two main terms sexo (gender) and status (treatment). That differences made a lot of difference in the final results. Anybody can tell me what is the differences between them? Thanks. Enrico. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student Health Psychology University of California, Los Angeles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Prime Numbers Pkgs
Looking for a recommended package that handles prime number computations. I'm not sure whether this would be helpful to you, but Sage (http://www.sagemath.org) has excellent number theory support and several ways to interface with R (which is included in the distribution of Sage). I use it myself to access several R packages I need to do analysis on data I produce in Sage/Python, and then send the data back to Sage for further processing. Of course, this is not an R package, so depending on how much you need such things from within R itself it may or may not help you out. HTH, Karl-Dieter Crisman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unqiue problem
Your process does remove all the duplicate entries based on the content of the two columns. After you do this, there are still duplicate entries in the first column that you are trying to use as rownames and therefore the error. Why to you want to use non-unique entries as rownames? Do you really need the row names, or should you only be keeping unique values for the first column? On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz fry...@gmail.com wrote: Hello everybody, I have a a matrix of 2 columns and over 27k rows. some of the rows are double , so I tried to remove them with the command unique(): Workbook5 - read.delim(file = Workbook5.txt) dim(Workbook5) [1] 27748 2 Workbook5 - unique(Workbook5) dim(Workbook5) [1] 20101 2 it removed a lot of line, but unfortunately not all of them. I wanted to add the row names to the matrix and got this error message: rownames(Workbook5) - Workbook5[,1] Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L, : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‚ÄòA_51_P102339‚Äô, ‚ÄòA_51_P102518‚Äô, ‚ÄòA_51_P103435‚Äô, ‚ÄòA_51_P103465‚Äô, ‚ÄòA_51_P103594‚Äô, ‚ÄòA_51_P104409‚Äô, ‚ÄòA_51_P104718‚Äô, ‚ÄòA_51_P105869‚Äô, ‚ÄòA_51_P106428‚Äô, ‚ÄòA_51_P106799‚Äô, ‚ÄòA_51_P107176‚Äô, ‚ÄòA_51_P107959‚Äô, ‚ÄòA_51_P108767‚Äô, ‚ÄòA_51_P109258‚Äô, ‚ÄòA_51_P109708‚Äô, ‚ÄòA_51_P110341‚Äô, ‚ÄòA_51_P111757‚Äô, ‚ÄòA_51_P112427‚Äô, ‚ÄòA_51_P112662‚Äô, ‚ÄòA_51_P113672‚Äô, ‚ÄòA_51_P115018‚Äô, ‚ÄòA_51_P116496‚Äô, ‚ÄòA_51_P116636‚Äô, ‚ÄòA_51_P117666‚Äô, ‚ÄòA_51_P118132‚Äô, ‚ÄòA_51_P118168‚Äô, ‚ÄòA_51_P118400‚Äô, ‚ÄòA_51_P118506‚Äô, ‚ÄòA_51_P119315‚Äô, ‚ÄòA_51_P120093‚Äô, ‚ÄòA_51_P120305‚Äô, ‚ÄòA_51_P120738‚Äô, ‚ÄòA_51_P120785‚Äô, ‚ÄòA_51_P121134‚Äô, ‚ÄòA_51_P121359‚Äô, ‚ÄòA_51_P121412‚Äô, ‚ÄòA_51_P121652‚Äô, ‚ÄòA_51_P121724‚Äô, ‚ÄòA_51_P121829‚Äô, ‚ÄòA_51_P122141‚Äô, ‚ÄòA_51_P122964‚Äô, ‚ÄòA_51_P123422‚Äô, ‚ÄòA_51_P123895‚Äô, ‚ÄòA_51_P124008‚Äô, ‚ÄòA_51_P124719‚Äô, ‚ÄòA_51_P125648‚Äô, ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated] Is there a better way to discard the duplicataions in the text file (Excel file is the origin). R.version _ platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status Patched major 2 minor 11.1 year 2010 month 06 day 03 svn rev 52201 language R version.string R version 2.11.1 Patched (2010-06-03 r52201) THX Assa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression with 50 varaibales
On Mon, 14 Jun 2010, Joris Meys wrote: Hi, Marcs explanation is valid to a certain extent, but I don't agree with his conclusion. I'd like to point out the curse of dimensionality(Hughes effect) which starts to play rather quickly. Ahem! ... minimal, self-contained, reproducible code ... The OPs situation may not be an impossible one: set.seed(54321) dat - as.data.frame(matrix(rnorm(1800*50),nc=50)) dat$y - rbinom(1800,1,plogis(rowSums(dat)/7)) fit - glm(y~., dat, family=binomial) 1/7 # the true coef [1] 0.1428571 sd(coef(fit)) # roughly, the common standard error [1] 0.05605597 colMeans(coef(summary(fit))) # what glm() got Estimate Std. Errorz value Pr(|z|) 0.14590836 0.05380666 2.71067328 0.06354820 # a trickier case: set.seed(54321) dat - as.data.frame(matrix(rnorm(1800*50),nc=50)) dat$y - rbinom(1800,1,plogis(rowSums(dat))) # try again with coef==1 fit - glm(y~., dat, family=binomial) colMeans(coef(summary(fit))) Estimate Std. Error z valuePr(|z|) 0.982944012 0.119063631 8.222138491 0.008458002 Finer examination of the latter fit will show some values that differ too far from 1.0 to agree with the asymptotic std err. sd(coef(fit)) # rather bigger than 0.119 [1] 0.1827462 range(coef(fit)) [1] -0.08128586 1.25797057 And near separability may be playing here: cbind( + table( + cut( + plogis(abs(predict(fit))), + c( 0, 0.9, 0.99, 0.999, 0., 0.9, 1 ) ) ) ) [,1] (0,0.9] 453 (0.9,0.99]427 (0.99,0.999] 313 (0.999,0.]251 (0.,0.9] 173 (0.9,1] 183 Recall that the observed information contains a factor of plogis( predict(fit) )* plogis( -predict(fit)) hist(plogis( predict(fit) )* plogis( -predict(fit))) So the effective sample size here was much reduced. But to the OP's question, whether what you get is reasonable depends on what the setup is. I wouldn't call the first of the above cases 'highly unstable'. Which is not to say that one cannot generate difficult cases (esp. with correlated covariates and/or one or more highly influential covariates) and that the OPs case is not one of them. HTH, Chuck The curse of dimensionality is easily demonstrated looking at the proximity between your datapoints. Say we scale the interval in one dimension to be 1 unit. If you have 20 evenly-spaced observations, the distance between the observations is 0.05 units. To have a proximity like that in a 2-dimensional space, you need 20^2=400 observations. in a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The distance between your observations is important, as a sparse dataset will definitely make your model misbehave. Even with about 35 samples per variable, using 50 independent variables will render a highly unstable model, as your dataspace is about as sparse as it can get. On top of that, interpreting a model with 50 variables is close to impossible, and then I didn't even start on interactions. No point in trying I'd say. If you really need all that information, you might want to take a look at some dimension reduction methods first. Cheers Joris On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz marc_schwa...@me.com wrote: On Jun 13, 2010, at 10:20 PM, array chip wrote: Hi, this is not R technical question per se. I know there are many excellent statisticians in this list, so here my questions: I have dataset with ~1800 observations and 50 independent variables, so there are about 35 samples per variable. Is it wise to build a stable multiple logistic model with 50 independent variables? Any problem with this approach? Thanks John The general rule of thumb is to have 10-20 'events' per covariate degree of freedom. Frank has suggested that in some cases that number should be as high as 25. The number of events is the smaller of the two possible outcomes for your binary dependent variable. Covariate degrees of freedom refers to the number of columns in the model matrix. Continuous variables are 1, binary factors are 1, K-level factors are K - 1. So if out of your 1800 records, you have at least 500 to 1000 events, depending upon how many of your 50 variables are K-level factors and whether or not you need to consider interactions, you may be OK. Better if towards the high end of that range, especially if the model is for prediction versus explanation. Two excellent references would be Frank's book: ?http://www.amazon.com/Regression-Modeling-Strategies-Frank-Harrell/dp/0387952322/ and Steyerberg's book: ?http://www.amazon.com/Clinical-Prediction-Models-Development-Validation/dp/038777243X/ to assist in providing guidance for model building/validation techniques. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
Re: [R] points marking
I don't think that I would use a barplot as the base, but rather just set up the graph and add the lines where I wanted them. I still don't understand what you want your graph to look like, or what question you are trying to answer with it (part may be a language barrier). If you can give us a better example of what you are trying to accomplish, or a better description of what your data is like and what you are trying to get from the graph, we will have a better chance of being able to help you. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 From: khush [mailto:bioinfo.kh...@gmail.com] Sent: Saturday, June 12, 2010 5:38 AM To: Greg Snow Cc: r-help@r-project.org; Petr PIKAL Subject: Re: [R] points marking Hi, Well Thanks for letting me know that pch is of no use with segments petr. I am using lend as it suits to me more as gregory suggested , but I am not getting imite??? think I try to fix it with some other method also, as I have to deal more with the symbols in this case, But I want to the know one thing from you guys that the way I am using the code is good enough to start, as I am not much familiar with this suff or its dirty way to handle such task. please let me know. Thanks gregory and petr. Thank you Jeet On Fri, Jun 11, 2010 at 9:07 PM, Greg Snow greg.s...@imail.orgmailto:greg.s...@imail.org wrote: Those graphs look like chromosome maps, if so, you may want to look into the bioconductor project, they may have some prewritten functions to do this. If not, the lend argument (see ?par) may be something to look at. If you really want points and segments you will need to plot the points with the points function and the segments separately. Segments can take vectors, so you don't need to separate things into multiple calls. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.orgmailto:greg.s...@imail.org 801.408.8111 From: khush [mailto:bioinfo.kh...@gmail.commailto:bioinfo.kh...@gmail.com] Sent: Friday, June 11, 2010 12:00 AM To: Greg Snow Cc: r-help@r-project.orgmailto:r-help@r-project.org Subject: Re: [R] points marking Dear Gregory , Thnaks for your reply and help. I am explaining you my problems again, below is my script for the same . Dom -c (195,568,559) fkbp - barplot (Dom, col=black, xlab=, border = NA, space = 7, xlim=c(0,650), ylim =c(0, 87), las = 2, horiz = TRUE) axis (1, at = seq(0,600,10), las =2) 1. ==Segments 1= segments(164,7.8,192,7.8, col = green, pch=23, cex=9, lty=solid, lwd=20) segments(45,15.8,138,15.8, col = green, pch=23, cex=9, lty=solid, lwd=20) segments(160,15.8,255,15.8, col = green, pch=23, cex=9, lty=solid, lwd=20) segments(277,15.8,378,15.8, col = green, pch=23, cex=9, lty=solid, lwd=20) segments(51,23.8,145,23.8, col = green, pch=23, cex=9, lty=solid, lwd=20) segments(167,23.8,262,23.8, col = green, pch=23, cex=9, lty=solid, lwd=20) segments(284,23.8,381,23.8, col = green, pch=23, cex=9, lty=solid, lwd=20) 2. ==Segments 2 == segments(399,15.8,432,15.8, col = blue, pch=21, cex=9, lty=solid, lwd=20) segments(448,15.8,475,15.8, col = blue, pch=21, cex=9, lty=solid, lwd=20) segments(486,15.8,515,15.8, col = blue, pch=21, cex=9, lty=solid, lwd=20) segments(401,23.8,434,23.8, col = blue, pch=21, cex=9, lty=solid, lwd=20) segments(450,23.8,475,23.8, col = blue, pch=21, cex=9, lty=solid, lwd=20) segments(486,23.8,517,23.8, col = blue, pch=21, cex=9, lty=solid, lwd=20) I solved one part of my query i.e to mark points from one positions to other is ok and I found that its working fine but I have another issue now, as I am using using two segments data 1 and 2 , although I want to draw different shapes for segmants 2 as I am giving pch=21, but I it seems to give a solid line for both. I want to draw different shapes for every chunk of segments i.e is the whole point. I want to make script which can generate such figures, below is link to one of the tool. http://www.expasy.ch/tools/mydomains/ Thank you Jeet On Thu, Jun 10, 2010 at 11:10 PM, Greg Snow greg.s...@imail.orgmailto:greg.s...@imail.org wrote: Your question is not really clear, do either of these examples do what you want? with(anscombe, plot(x1, y2, ylim=range(y2,y3)) ) with(anscombe, points(x1, y3, col='blue', pch=2) ) with(anscombe, segments(x1, y2, x1, y3, col=ifelse( y2y3, 'green','red') ) ) with(anscombe, plot(x1, y2, ylim=range(y2,y3), type='n') ) with(anscombe[order(anscombe$x1),], polygon( c( x1,rev(x1) ), c(y2, rev(y3)), col='grey' ) ) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.orgmailto:greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org [mailto:r-help-boun...@r-mailto:r-help-boun...@r-
[R] Revolutions Blog: May Roundup
I write about R every weekday at the Revolutions blog: http://blog.revolutionanalytics.com and every month I post a summary of articles from the previous month of particular interest to readers of r-help. http://bit.ly/dn7DgR linked to 13 videos for learning R, from the basics (What is R?) to more advanced topics. http://bit.ly/cJUhiY noted the release of R 2.11.1. http://bit.ly/d53tvn announced that Revolution Analytics makes its software available free of charge to the academic community. http://bit.ly/9xCQ83 noted how statistical inference was used to accurately estimate German tank forces in WWII, and linked to an R simulation to verify the calculation. http://bit.ly/b5puKD announced a new website for the R community, www.inside-R.org, sponsored by Revolution Analytics. http://bit.ly/9r85bd linked to a video of economist JD Long explaining why he uses R. http://bit.ly/cySMgE linked to Jeromy Anglim's explanation of the abbreviated names of 150 R functions. http://bit.ly/cEceU8 announced a webinar I gave, Introduction to Revolution R. The live event has passed now, but you can download slides and watch a replay at this link. http://bit.ly/brs2s2 recapped some recent news articles mentioning R and Revolution, in Forbes, The Register, PC World and elsewhere. http://bit.ly/9UDgOL linked to an analysis in R on predicting the outcome of a series of baseball games. http://bit.ly/bJYW9v linked to some materials from the CloudAsia conference on parallel computing in R for life sciences. http://bit.ly/dfb4PA provided a tip on keeping the console window active in the R Productivity Environment GUI. http://bit.ly/aeHO7B announced a webinar on integrating R-based graphs and computations with business intelligence dashboards. (The live event has passed, but you can download slides and a replay at this link.) http://bit.ly/dqr1hc linked to another method of mapping your Twitter social network with R. http://bit.ly/bVI6e9 linked to a video by JD Long on using the Amazon Elastic Map-Reduce system to run large-scale map-reduce calculations in the cloud with R. http://bit.ly/auZ7N8 announced Revolution Analytic's development roadmap for 2010. There are new R user groups in Boston (http://bit.ly/cHedm0) and Atlanta (http://bit.ly/aVo1cI). Also, http://bit.ly/a82GAf noted the list of local R User Groups worldwide at MeetUp.com, and how you can request a new group in your area. Other non-R-related stories in the past month included Google's new BigQuery and Prediction API tools (http://bit.ly/bfEeLm), the effects of volcanic ash on a modern city (http://bit.ly/9qWfQf) and (on a lighter note) the dating equation (http://bit.ly/9LR28N) and a neat, practical optical illusion (http://bit.ly/dsmyov). The R Community Calendar has also been updated at: http://blog.revolutionanalytics.com/calendar.html If you're looking for more articles about R, you can find summaries from previous months at http://blog.revolutionanalytics.com/roundups/. Join the REvolution mailing list at http://revolutionanalytics.com/newsletter to be alerted to new articles on a monthly basis. As always, thanks for the comments and please keep sending suggestions to me at da...@revolutionanalytics.com . Don't forget you can also follow the blog using an RSS reader like Google Reader, or by following me on Twitter (I'm @revodavid). Cheers, # David -- David M Smith da...@revolutionanalytics.com VP of Marketing, Revolution Analytics http://blog.revolutionanalytics.com Tel: +1 (650) 330-0553 x205 (Palo Alto, CA, USA) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Html help
Murray Jorgensen wrote: I have just installed R 2.11.1 on my XP laptop. I like html help for browsing but text help for on-the-fly look-ups. I was a bit surprised when I was asked to choose between them during the installation. I chose text, thinking I could fix the html help later, which is what I am trying to do now. Now when I ask for html help my browser goes to 'http://-ip-number-/doc/html/index.html' instead of where I want on my computer: C:\apps\R\R-2.11.1\doc\html\index.html Now I can go where I want manually but then the package list on C:\apps\R\R-2.11.1\doc\html\packages.html does not include all the packages that I have installed and linked. I don't want to read my html help from the web because sometimes I am off-line or on a slow connection. How do I go about getting a local set of html help files? Since 2.10.0, HTML help is generated on demand. It doesn't go off your local computer, it works locally. This saves a bit of space (the HTML is generated from the same source as the text is generated from), but the main point is that it allows help pages to contain dynamic content. For example, Romain Francois posted some demo code a while ago to allow the display of graphics generated by R within help pages. (Unfortunately it depended on a particular browser feature not supported by Internet Explorer, so I'm going to need to put together something less elegant, but that's life.) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xtable with Sweave
On Jun 14, 2010, at 8:09 AM, Silvano wrote: Hi, I'm using Sweave to prepare a descriptive report. Are at least 20 tables built with xtable command of kind: echo=F, results=hide= q5 = factor(Q5, label=c(Não, Sim)) (q5.tab = cbind(table(q5))) @ echo=F, results=tex= xtable(q5.tab, align=l|c, caption.placement = top, table.placement='H') @ I'm getting the following message: Too many unprocessed floats in Latex file. How to avoid these messages appearing? Hi, That is an error message from 'latex' indicating that you may have too many float tables without sufficient separation (eg. new pages, text in between, etc.) and/or conflicts in table placement. You might have a look at the relevant TeX FAQ here: http://www.tex.ac.uk/cgi-bin/texfaq2html?label=tmupfl Also, while I don't use xtable(), I do believe that you have to call it using the print method directly to specify non-default arguments: print(xtable(q5.tab, align = l|c), caption.placement = top, table.placement = 'H') See the help pages for ?xtable and ?print.table, including the last examples in the former. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] script development for Unconditional Density and Probability estimation
Hello, I'd like to automate this script a bit more and cycle several parameters(both the species and the metric). For example where AnnualDepth occurs, I need to process about 12 metrics so instead of writing this entire script 12 times once for each metric I'd like to be able to automatically get another metric. Any suggestion will be greatly appreciated. Currently running Windows XP, R 2.11.1 ### Marsh - cbind(SoilVegHydro, vegcode) AnnualDepth - Marsh[,'meanAnnualDepthAve'] cattail_0 - Marsh[,'cattail'] == '0' # no need to run for 8 species, automate if possible cattail_1 - Marsh[,'cattail'] == '1' # yes need to run for 8 species spbase.rate.d1 - sum(cattail_1)/(sum(cattail_1) + sum(cattail_0) ) annualDepth.density - density(AnnualDepth)# this line needs either interactively defined or automatically cycle thru a number of metrics cattail.d0.density - density(AnnualDepth[cattail_0]) cattail.d1.density - density(AnnualDepth[cattail_1]) approxfun (cattail.d0.density$x, cattail.d0.density$y) - cattail.d0.f approxfun (cattail.d1.density$x, cattail.d1.density$y) - cattail.d1.f p.d.given.AnnualDepth - function(AnnualDepth, spbase.rate.d1) { p1 - cattail.d1.f(AnnualDepth) * spbase.rate.d1 p0 - cattail.d0.f(AnnualDepth) * (1 - spbase.rate.d1) p1/(cattail_0+cattail_1) } x - 1:1292 y - p.d.given.AnnualDepth(x, spbase.rate.d1) plot (x, y, type='l', col='red', xlab='Mean Annual Depth', main=c(Cattail), ylab='estimated\nProbability(cattail|AnnualDepth)') plot (cattail.d0.density, col ='red', lty= 1, main = ) lines(cattail.d1.density, col = 'blue', lty=1) lines(annualDepth.density , col='black', lty = 1) legend(2000, 0.0023, c(No Cattail, Cattail, Mean Annual Depth), col=c(red, blue, black),lty=c(1)) # Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count of unique factors within another factor
Hi Dennis, Thanks for this suggestion (which I got to run!), as this code makes intuitive sense, whereas not all the other suggestions were that straightforward. I'm relatively new to programming in R and am very appreciative that you and others take time to help out where you can. Sincerely, Sarah On Sun, Jun 13, 2010 at 8:47 PM, Dennis Murphy [via R] ml-node+2253845-1393472685-291...@n4.nabble.comml-node%2b2253845-1393472685-291...@n4.nabble.com wrote: Hi: Another possibility: as.data.frame(with(data[!duplicated(data), ], table(unit)) unit Freq 1 1233 2 3454 HTH, Dennis On Sun, Jun 13, 2010 at 9:07 AM, Birdnerd [hidden email]http://user/SendEmail.jtp?type=nodenode=2253845i=0 wrote: I have a data frame with two factors (sampling 'unit', 'species'). I want to calculate the number of unique 'species' per 'unit.' I can calculate the number of unique values for each variable separately, but can't get a count for each âunitâ. data=read.csv(C:/Desktop/sr_sort_practice.csv) attach(data) data[1:10,] unit species 1 123ACMA 2 123LIDE 3 123LIDE 4 123SESE 5 123SESE 6 123SESE 7 345HEAR 8 345LOHI 9 345QUAG 10 345TODIâ¦.. sr.unique- lapply (data, unique) $unit [1] 123 345 216 $species [1] ACMA LIDE SESE HEAR LOHI QUAG TODI UMCA ARSP LIDE sapply (sr.unique,length) unit species 3 10 Then, I get stuck here because this unique species count is not given for each âunitâ. What I'd like to get is: unit species 1233 3454 216-- Thanks-- -- View this message in context: http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.htmlhttp://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253545.html?by-user=tby-user=t Sent from the R help mailing list archive at Nabble.com. __ [hidden email] http://user/SendEmail.jtp?type=nodenode=2253845i=1mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ [hidden email] http://user/SendEmail.jtp?type=nodenode=2253845i=2mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View message @ http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2253845.html To unsubscribe from Count of unique factors within another factor, click here (link removed) . -- Sarah E. Haas haaszool...@gmail.com Center for Applied Geographic Information Science (CAGIS) Department of Geography and Earth Sciences University of North Carolina at Charlotte 9201 University City Blvd. Charlotte, NC 28223, USA http://www.gis.uncc.edu/ -- View this message in context: http://r.789695.n4.nabble.com/Count-of-unique-factors-within-another-factor-tp2253545p2254591.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xtable with Sweave
Hi Silvano, Silvano wrote: Hi, I'm using Sweave to prepare a descriptive report. Are at least 20 tables built with xtable command of kind: echo=F, results=hide= q5 = factor(Q5, label=c(Não, Sim)) (q5.tab = cbind(table(q5))) @ echo=F, results=tex= xtable(q5.tab, align=l|c, caption.placement = top, table.placement='H') @ I'm getting the following message: Too many unprocessed floats in Latex file. How to avoid these messages appearing? Not really an R-help question, and easily answered using our friend google. Anyway, you need to cause the floats to be processed before you have 'too many'. This is done by adding a \clearpage to the LaTeX portion of your document every so often. Best, Jim -- Silvano Cesar da Costa Departamento de Estatística Universidade Estadual de Londrina Fone: 3371-4346 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] script development for Unconditional Density and Probability estimation
First start by putting it in a function so you can specify the parameters you want to change. On Mon, Jun 14, 2010 at 11:54 AM, steve_fried...@nps.gov wrote: Hello, I'd like to automate this script a bit more and cycle several parameters(both the species and the metric). For example where AnnualDepth occurs, I need to process about 12 metrics so instead of writing this entire script 12 times once for each metric I'd like to be able to automatically get another metric. Any suggestion will be greatly appreciated. Currently running Windows XP, R 2.11.1 ### Marsh - cbind(SoilVegHydro, vegcode) AnnualDepth - Marsh[,'meanAnnualDepthAve'] cattail_0 - Marsh[,'cattail'] == '0' # no need to run for 8 species, automate if possible cattail_1 - Marsh[,'cattail'] == '1' # yes need to run for 8 species spbase.rate.d1 - sum(cattail_1)/(sum(cattail_1) + sum(cattail_0) ) annualDepth.density - density(AnnualDepth) # this line needs either interactively defined or automatically cycle thru a number of metrics cattail.d0.density - density(AnnualDepth[cattail_0]) cattail.d1.density - density(AnnualDepth[cattail_1]) approxfun (cattail.d0.density$x, cattail.d0.density$y) - cattail.d0.f approxfun (cattail.d1.density$x, cattail.d1.density$y) - cattail.d1.f p.d.given.AnnualDepth - function(AnnualDepth, spbase.rate.d1) { p1 - cattail.d1.f(AnnualDepth) * spbase.rate.d1 p0 - cattail.d0.f(AnnualDepth) * (1 - spbase.rate.d1) p1/(cattail_0+cattail_1) } x - 1:1292 y - p.d.given.AnnualDepth(x, spbase.rate.d1) plot (x, y, type='l', col='red', xlab='Mean Annual Depth', main=c(Cattail), ylab='estimated\nProbability(cattail|AnnualDepth)') plot (cattail.d0.density, col ='red', lty= 1, main = ) lines(cattail.d1.density, col = 'blue', lty=1) lines(annualDepth.density , col='black', lty = 1) legend(2000, 0.0023, c(No Cattail, Cattail, Mean Annual Depth), col=c(red, blue, black),lty=c(1)) # Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 steve_fried...@nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unqiue problem
I thought unique delete the whole line. I don't really need the row names, but I thought of it as a way of getting the unique items. Is there a way of deleting whole lines completely according to their identifiers? What I really need are unique values on the first column. Assa On Mon, Jun 14, 2010 at 18:04, jim holtman jholt...@gmail.com wrote: Your process does remove all the duplicate entries based on the content of the two columns. After you do this, there are still duplicate entries in the first column that you are trying to use as rownames and therefore the error. Why to you want to use non-unique entries as rownames? Do you really need the row names, or should you only be keeping unique values for the first column? On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz fry...@gmail.com wrote: Hello everybody, I have a a matrix of 2 columns and over 27k rows. some of the rows are double , so I tried to remove them with the command unique(): Workbook5 - read.delim(file = Workbook5.txt) dim(Workbook5) [1] 27748 2 Workbook5 - unique(Workbook5) dim(Workbook5) [1] 20101 2 it removed a lot of line, but unfortunately not all of them. I wanted to add the row names to the matrix and got this error message: rownames(Workbook5) - Workbook5[,1] Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L, : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': âÃòA_51_P102339âÃô, âÃòA_51_P102518âÃô, âÃòA_51_P103435âÃô, âÃòA_51_P103465âÃô, âÃòA_51_P103594âÃô, âÃòA_51_P104409âÃô, âÃòA_51_P104718âÃô, âÃòA_51_P105869âÃô, âÃòA_51_P106428âÃô, âÃòA_51_P106799âÃô, âÃòA_51_P107176âÃô, âÃòA_51_P107959âÃô, âÃòA_51_P108767âÃô, âÃòA_51_P109258âÃô, âÃòA_51_P109708âÃô, âÃòA_51_P110341âÃô, âÃòA_51_P111757âÃô, âÃòA_51_P112427âÃô, âÃòA_51_P112662âÃô, âÃòA_51_P113672âÃô, âÃòA_51_P115018âÃô, âÃòA_51_P116496âÃô, âÃòA_51_P116636âÃô, âÃòA_51_P117666âÃô, âÃòA_51_P118132âÃô, âÃòA_51_P118168âÃô, âÃòA_51_P118400âÃô, âÃòA_51_P118506âÃô, âÃòA_51_P119315âÃô, âÃòA_51_P120093âÃô, âÃòA_51_P120305âÃô, âÃòA_51_P120738âÃô, âÃòA_51_P120785âÃô, âÃòA_51_P121134âÃô, âÃòA_51_P121359âÃô, âÃòA_51_P121412âÃô, âÃòA_51_P121652âÃô, âÃòA_51_P121724âÃô, âÃòA_51_P121829âÃô, âÃòA_51_P122141âÃô, âÃòA_51_P122964âÃô, âÃòA_51_P123422âÃô, âÃòA_51_P123895âÃô, âÃòA_51_P124008âÃô, âÃòA_51_P124719âÃô, âÃòA_51_P125648âÃô, âÃòA_51_P125679âÃô, âÃòA_51_P125779â [... truncated] Is there a better way to discard the duplicataions in the text file (Excel file is the origin). R.version _ platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status Patched major 2 minor 11.1 year 2010 month 06 day03 svn rev52201 language R version.string R version 2.11.1 Patched (2010-06-03 r52201) THX Assa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging data frames
If you want to keep only the rows that are unique in the first column then do the following: workComb1 - subset(workComb, !duplicated(ProbeID)) On Mon, Jun 14, 2010 at 11:20 AM, Assa Yeroslaviz fry...@gmail.com wrote: well, the problem is basically elsewhere. I have a data frame with expression data and doubled IDs in the first column (see example) when I want to put them into row names I get the message, that there are non-unique items in the data. So I tried with unique to delete such rows. The problem is unique doesn't delete all of them. I compare two data frames with their Probe IDs. I would like to delete all double lines with a certain probe ID independent from the rest of the line, as to say I would like a data frame with single unique idetifiers in the Probe Id column. merge doesn't give me that. It doesn't delete all similar line, if the line are not identical in the other columns it leaves them in the table. Is there a way of deleting whole the line with double Probe IDs? workbook - read.delim(file = workbook1.txt, quote = , sep = \t) GeneID - read.delim(file = testTable.txt, quote = , sep = \t) workComb - merge(workbook, GeneID, by.x = ProbeID, by.y = Probe.Id) workComb1 - unique(workComb) write.table(workComb, file = workComb.txt , sep = \t, quote = FALSE, row.names = FALSE) write.table(workComb1, file = workComb1.txt , sep = \t, quote = FALSE, row.names = FALSE) look at lines 49 and 50 in the file workComb1.txt after using unique on the file. The line are identical with the exception of the Transcript ID. I would like to take one of them out of the table. THX, Assa On Mon, Jun 14, 2010 at 15:33, jim holtman jholt...@gmail.com wrote: Put the rownames as another column in your dataframe so that it remains with the data. After merging, you can then use it as the rownames On Mon, Jun 14, 2010 at 9:25 AM, Assa Yeroslaviz fry...@gmail.com wrote: Hi, is it possible to merge two data frames while preserving the row names of the bigger data frame? I have two data frames which i would like to combine. While doing so I always loose the row names. When I try to append this, I get the error message, that I have non-unique names. This although I used unique command on the data frame where the double inputs supposedly are thanks for the help Assa [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count of unique factors within another factor
Another possibility: rowSums(table(x) 0) On Sun, Jun 13, 2010 at 3:08 PM, Erik Iverson er...@ccbr.umn.edu wrote: I think ?tapply will help here. But *please* read the posting guide and provide minimal, reproducible examples! Birdnerd wrote: I have a data frame with two factors (sampling 'unit', 'species'). I want to calculate the number of unique 'species' per 'unit.' I can calculate the number of unique values for each variable separately, but can't get a count for each unit. data=read.csv(C:/Desktop/sr_sort_practice.csv) attach(data) data[1:10,] unit species 1 123ACMA 2 123LIDE 3 123LIDE 4 123SESE 5 123SESE 6 123SESE 7 345HEAR 8 345LOHI 9 345QUAG 10 345TODI .. sr.unique- lapply (data, unique) $unit [1] 123 345 216 $species [1] ACMA LIDE SESE HEAR LOHI QUAG TODI UMCA ARSP LIDE sapply (sr.unique,length) unit species 3 10 Then, I get stuck here because this unique species count is not given for each unit. What I'd like to get is: unit species 1233 3454 216-- Thanks-- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How we can open the results are saved
Hi all I saved the result of my code as a file, like save(namefunction,file=adresse/filename.R). I want to open the filename. Could you please help me how I can open the filename and see the result. best Khazaei __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How we can open the results are saved
load('adresse/filename.R') On Mon, Jun 14, 2010 at 12:41 PM, khaz...@ceremade.dauphine.fr wrote: Hi all I saved the result of my code as a file, like save(namefunction,file=adresse/filename.R). I want to open the filename. Could you please help me how I can open the filename and see the result. best Khazaei __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How we can open the results are saved
No. Binary workspace data are saved by default with the .Rdata extension and are opened (actually have their contents added to the current workspace) by load(). .R text files and would need to be sourced: source('adresse/filename.R') Bert Gunter Genentech Nonclinical Biostatistics -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of jim holtman Sent: Monday, June 14, 2010 9:50 AM To: khaz...@ceremade.dauphine.fr Cc: r-help@r-project.org Subject: Re: [R] How we can open the results are saved load('adresse/filename.R') On Mon, Jun 14, 2010 at 12:41 PM, khaz...@ceremade.dauphine.fr wrote: Hi all I saved the result of my code as a file, like save(namefunction,file=adresse/filename.R). I want to open the filename. Could you please help me how I can open the filename and see the result. best Khazaei __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unqiue problem
On Jun 14, 2010, at 12:32 PM, Assa Yeroslaviz wrote: I thought unique delete the whole line. I don't really need the row names, but I thought of it as a way of getting the unique items. Is there a way of deleting whole lines completely according to their identifiers? What I really need are unique values on the first column. Assa On Mon, Jun 14, 2010 at 18:04, jim holtman jholt...@gmail.com wrote: Your process does remove all the duplicate entries based on the content of the two columns. After you do this, there are still duplicate entries in the first column that you are trying to use as rownames and therefore the error. Why to you want to use non-unique entries as rownames? Do you really need the row names, or should you only be keeping unique values for the first column? On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz fry...@gmail.com wrote: Hello everybody, I have a a matrix of 2 columns and over 27k rows. some of the rows are double , so I tried to remove them with the command unique(): Workbook5 - read.delim(file = Workbook5.txt) dim(Workbook5) [1] 27748 2 Workbook5 - unique(Workbook5) Jim already showed you one way in another thread and it is probably more intuitive than this way, but just so you know... Workbook5 - Workbook5[ unique(Workbook5[ ,1] ) , ] ... should have worked. Logical indexing on first column with return of both columns of qualifying rows. -- David. dim(Workbook5) [1] 20101 2 it removed a lot of line, but unfortunately not all of them. I wanted to add the row names to the matrix and got this error message: rownames(Workbook5) - Workbook5[,1] Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L, : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‚ÄòA_51_P102339‚Äô, ‚ÄòA_51_P102518‚Äô, ‚ÄòA_51_P103435‚Äô, ‚ÄòA_51_P103465‚Äô, ‚ÄòA_51_P103594‚Äô, ‚ÄòA_51_P104409‚Äô, ‚ÄòA_51_P104718‚Äô, ‚ÄòA_51_P105869‚Äô, ‚ÄòA_51_P106428‚Äô, ‚ÄòA_51_P106799‚Äô, ‚ÄòA_51_P107176‚Äô, ‚ÄòA_51_P107959‚Äô, ‚ÄòA_51_P108767‚Äô, ‚ÄòA_51_P109258‚Äô, ‚ÄòA_51_P109708‚Äô, ‚ÄòA_51_P110341‚Äô, ‚ÄòA_51_P111757‚Äô, ‚ÄòA_51_P112427‚Äô, ‚ÄòA_51_P112662‚Äô, ‚ÄòA_51_P113672‚Äô, ‚ÄòA_51_P115018‚Äô, ‚ÄòA_51_P116496‚Äô, ‚ÄòA_51_P116636‚Äô, ‚ÄòA_51_P117666‚Äô, ‚ÄòA_51_P118132‚Äô, ‚ÄòA_51_P118168‚Äô, ‚ÄòA_51_P118400‚Äô, ‚ÄòA_51_P118506‚Äô, ‚ÄòA_51_P119315‚Äô, ‚ÄòA_51_P120093‚Äô, ‚ÄòA_51_P120305‚Äô, ‚ÄòA_51_P120738‚Äô, ‚ÄòA_51_P120785‚Äô, ‚ÄòA_51_P121134‚Äô, ‚ÄòA_51_P121359‚Äô, ‚ÄòA_51_P121412‚Äô, ‚ÄòA_51_P121652‚Äô, ‚ÄòA_51_P121724‚Äô, ‚ÄòA_51_P121829‚Äô, ‚ÄòA_51_P122141‚Äô, ‚ÄòA_51_P122964‚Äô, ‚ÄòA_51_P123422‚Äô, ‚ÄòA_51_P123895‚Äô, ‚ÄòA_51_P124008‚Äô, ‚ÄòA_51_P124719‚Äô, ‚ÄòA_51_P125648‚Äô, ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated] Is there a better way to discard the duplicataions in the text file (Excel file is the origin). R.version _ platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status Patched major 2 minor 11.1 year 2010 month 06 day03 svn rev52201 language R version.string R version 2.11.1 Patched (2010-06-03 r52201) THX Assa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unqiue problem
On Jun 14, 2010, at 1:10 PM, David Winsemius wrote: On Jun 14, 2010, at 12:32 PM, Assa Yeroslaviz wrote: I thought unique delete the whole line. I don't really need the row names, but I thought of it as a way of getting the unique items. Is there a way of deleting whole lines completely according to their identifiers? What I really need are unique values on the first column. Assa On Mon, Jun 14, 2010 at 18:04, jim holtman jholt...@gmail.com wrote: Your process does remove all the duplicate entries based on the content of the two columns. After you do this, there are still duplicate entries in the first column that you are trying to use as rownames and therefore the error. Why to you want to use non-unique entries as rownames? Do you really need the row names, or should you only be keeping unique values for the first column? On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz fry...@gmail.com wrote: Hello everybody, I have a a matrix of 2 columns and over 27k rows. some of the rows are double , so I tried to remove them with the command unique(): Workbook5 - read.delim(file = Workbook5.txt) dim(Workbook5) [1] 27748 2 Workbook5 - unique(Workbook5) Jim already showed you one way in another thread and it is probably more intuitive than this way, but just so you know... Workbook5 - Workbook5[ unique(Workbook5[ ,1] ) , ] ... should have worked. Logical indexing on first column with return of both columns of qualifying rows. Actually I was thinking a bit askew although that would have succeeded. That was not logical indexing, which would have been done with duplicated() ... or rather its negation through the use of the ! unary operator: str(unique(Workbook5[ ,1] ) ) Factor w/ 17209 levels A_51_P100034,..: 1 2 3 4 5 6 7 8 9 10 ... str(!duplicated(Workbook5[ ,1] ) ) logi [1:20101] TRUE TRUE TRUE TRUE TRUE TRUE ... So this would have been the way to do it with logical indexing: Workbook5 - Workbook5[ !duplicated(Workbook5[ ,1] ) , ] -- David. dim(Workbook5) [1] 20101 2 it removed a lot of line, but unfortunately not all of them. I wanted to add the row names to the matrix and got this error message: rownames(Workbook5) - Workbook5[,1] Error in `row.names-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L, : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‚ÄòA_51_P102339‚Äô, ‚ÄòA_51_P102518‚Äô, ‚ÄòA_51_P103435‚Äô, ‚ÄòA_51_P103465‚Äô, ‚ÄòA_51_P103594‚Äô, ‚ÄòA_51_P104409‚Äô, ‚ÄòA_51_P104718‚Äô, ‚ÄòA_51_P105869‚Äô, ‚ÄòA_51_P106428‚Äô, ‚ÄòA_51_P106799‚Äô, ‚ÄòA_51_P107176‚Äô, ‚ÄòA_51_P107959‚Äô, ‚ÄòA_51_P108767‚Äô, ‚ÄòA_51_P109258‚Äô, ‚ÄòA_51_P109708‚Äô, ‚ÄòA_51_P110341‚Äô, ‚ÄòA_51_P111757‚Äô, ‚ÄòA_51_P112427‚Äô, ‚ÄòA_51_P112662‚Äô, ‚ÄòA_51_P113672‚Äô, ‚ÄòA_51_P115018‚Äô, ‚ÄòA_51_P116496‚Äô, ‚ÄòA_51_P116636‚Äô, ‚ÄòA_51_P117666‚Äô, ‚ÄòA_51_P118132‚Äô, ‚ÄòA_51_P118168‚Äô, ‚ÄòA_51_P118400‚Äô, ‚ÄòA_51_P118506‚Äô, ‚ÄòA_51_P119315‚Äô, ‚ÄòA_51_P120093‚Äô, ‚ÄòA_51_P120305‚Äô, ‚ÄòA_51_P120738‚Äô, ‚ÄòA_51_P120785‚Äô, ‚ÄòA_51_P121134‚Äô, ‚ÄòA_51_P121359‚Äô, ‚ÄòA_51_P121412‚Äô, ‚ÄòA_51_P121652‚Äô, ‚ÄòA_51_P121724‚Äô, ‚ÄòA_51_P121829‚Äô, ‚ÄòA_51_P122141‚Äô, ‚ÄòA_51_P122964‚Äô, ‚ÄòA_51_P123422‚Äô, ‚ÄòA_51_P123895‚Äô, ‚ÄòA_51_P124008‚Äô, ‚ÄòA_51_P124719‚Äô, ‚ÄòA_51_P125648‚Äô, ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated] Is there a better way to discard the duplicataions in the text file (Excel file is the origin). R.version _ platform x86_64-apple-darwin9.8.0 arch x86_64 os darwin9.8.0 system x86_64, darwin9.8.0 status Patched major 2 minor 11.1 year 2010 month 06 day03 svn rev52201 language R version.string R version 2.11.1 Patched (2010-06-03 r52201) THX Assa __ R-help@r-project.org mailing list David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overlay of barchart and xyplot
Thanks a lot! Huapeng -Original Message- From: foolish.andr...@gmail.com [mailto:foolish.andr...@gmail.com] On Behalf Of Felix Andrews Sent: Friday, June 11, 2010 8:23 PM To: Chen, Huapeng FOR:EX Cc: r-help@r-project.org Subject: Re: [R] Overlay of barchart and xyplot Hi, I have an example below of adding a key to the merged plot. You can not have the key on the right hand side because that viewport is used by the second ylab (ylab2 from doubleYScale). Well, if you really wanted to, you could do it with the grid package, using frameGrob or somesuch. NTLST_Dispersal_VAR_00_08$Year - factor(NTLST_Dispersal_VAR_00_08$Year, levels = c(1999,2000,2001,2002,2003,2004,2005,2006,2007), ordered = TRUE) dispersal- barchart(LDP_PER*100 + SPP_PER*100 + SPG_PER*100 ~ Year | District, data=NTLST_Dispersal_VAR_00_08, stack=TRUE, layout=c(5,5), scales = list(x = list(rot = 90)), xlab=Year, ylab=%, strip = strip.custom( bg=light gray), par.settings = simpleTheme(col = c(dark gray, light gray, white)), auto.key = list(points = FALSE, rectangles = TRUE) ) vars - xyplot(sqrt(Infestation_NUM) + AI ~ Year | District, data=NTLST_Dispersal_VAR_00_08, layout=c(5,5), type=b, ylab=Square roots of number of infested cells/Landscape aggregation index, auto.key = list(lines = TRUE) ) dblplot - doubleYScale(dispersal, vars, use.style=FALSE, add.ylab2 = TRUE ) dblplot - update(dblplot, par.settings = simpleTheme(fill = c(white, dark gray, black), border=black,col.line=black, col.points=black,pch=c(16,17),lty=c(1,1,1,2,1)) ) ## include second key at the bottom update(dblplot, legend = list(bottom = vars$legend$top)) ## Otherwise you could just include a key argument in the first plot which includes all the items explicitly. ## Or merge the two 'auto.key's at the top: mergeLegends - function(a, b, ...) { g - frameGrob() agrob - a if (!inherits(a, grob)) { a - eval(as.call(c(as.symbol(a$fun), a$args)), getNamespace(lattice)) } if (!inherits(b, grob)) { b - eval(as.call(c(as.symbol(b$fun), b$args)), getNamespace(lattice)) } g - packGrob(g, a, side = left) packGrob(g, b, side = right) } update(dblplot, legend = list(top = list(fun = mergeLegends, args = list(a = dispersal$legend$top, b = vars$legend$top On 5 June 2010 04:49, Chen, Huapeng FOR:EX huapeng.c...@gov.bc.ca wrote: Hi Felix, Thanks for your help and advice. The following code is close to what I want but still have problems of failure to custom lines and add a key in any way. Par.settings with the final plot seems not working somehow except pch and lty but they overwrite par.setting with barchart. I also attached data I used by using dput. I appreciate your further helps. Thanks, Huapeng # code # NTLST_Dispersal_VAR_00_08$Year - factor(NTLST_Dispersal_VAR_00_08$Year, levels = c(1999,2000,2001,2002,2003,2004,2005,2006,2007), ordered = TRUE) dispersal-barchart(NTLST_Dispersal_VAR_00_08$LDP_PER*100 + NTLST_Dispersal_VAR_00_08$SPP_PER*100 + NTLST_Dispersal_VAR_00_08$SPG_PER*100 ~ NTLST_Dispersal_VAR_00_08$Year | NTLST_Dispersal_VAR_00_08$District, data=NTLST_Dispersal_VAR_00_08, horizontal=FALSE, stack=TRUE, layout=c(5,5), xlab=Year, ylab=%, strip = strip.custom( bg=light gray), par.settings = simpleTheme(col = c(dark gray, light gray, white)), #key=list(space=right,size=10, # rectangles=list(size=1.7, border=black, col = c(white, dark gray, black)), #lines=list(pch=c(16,17),lty=c(1,2),col=black,type=b), # text=list(text=c(SPG,SPP,LDP))) #auto.key=TRUE ) xyplot(sqrt(NTLST_Dispersal_VAR_00_08$Infestation_NUM) + NTLST_Dispersal_VAR_00_08$AI ~ NTLST_Dispersal_VAR_00_08$Year | NTLST_Dispersal_VAR_00_08$District, data=NTLST_Dispersal_VAR_00_08, layout=c(5,5), type=b, ylab=Square roots of number of infested cells/Landscape aggregation index, #par.settings = simpleTheme(col = c(black, black), pch=c(16,17)), #key=list(space=right,size=10, #rectangles=list(size=1.7, border=black, col = c(white, dark gray, black)), # lines=list(pch=c(16,17),lty=c(1,2),col=black,type=b), #
Re: [R] Prime Numbers Pkgs - Schoolmath is broken
On Mon, Jun 14, 2010 at 7:16 AM, Red Roo redr...@yahoo.com wrote: Looking for a recommended package that handles prime number computations. The gmp package (http://crantastic.org/packages/gmp) has some good tools for prime numbers. I've used the is.prime function before; it's stochastic (in the sense that it gives the right answer with a high probability), but very fast for large numbers. I'm pretty sure there are exact functions for testing and generating primes too. # David Smith -- David M Smith da...@revolutionanalytics.com VP of Marketing, Revolution Analytics http://blog.revolutionanalytics.com Tel: +1 (650) 330-0553 x205 (Palo Alto, CA, USA) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recursive function
Hi! Do you mean something like this (df is your original data frame): --- cut here --- df1-df df1[[1]]-paste(R,df[[1]],sep=_) colnames(df1)-c(SERIES,YEAR,value) df1$value[ df1$YEAR==2009 ]-5 for (i in c(2009:2007)) { df1$value[ df1$YEAR==(i-1) ]-( df1$value[ df1$YEAR==i ]-df$DELTA[ df$year==i ] ) } --- cut here --- Now the output: df1 SERIES YEAR value 1 R_EC01 2006 -19 2 R_EC01 2007-9 3 R_EC01 2008-4 4 R_EC01 2009 5 Please let me know if you were looking for a more general approach suitable for larger data frames with e.g. several variable classes (EC01, EC02 etc.) Kind regards, Kimmo -- University of Turku, Finland Dep. of Political Science and Contemporary history __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How we can open the results are saved
On 14/06/2010 17:50, jim holtman wrote: load('adresse/filename.R') Or: attach('adresse/filename.R') The difference between 'load' and 'attach' is that 'load' puts the contents of the file into your workspace (global environment, first location on the search list), while 'attach' creates a new location on the search list. But calling this 'filename.R' is likely to lead to trouble. The '.R' suffix is usually used for files containing R commands that can be used by the 'source' function. The usual convention for files created by 'save' is to use a '.rda' suffix. Pat On Mon, Jun 14, 2010 at 12:41 PM,khaz...@ceremade.dauphine.fr wrote: Hi all I saved the result of my code as a file, like save(namefunction,file=adresse/filename.R). I want to open the filename. Could you please help me how I can open the filename and see the result. best Khazaei __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recursive function
Try this: transform(x, DELTA = NULL, value = rev(c(5, 5 - cumsum(rev(DELTA[-1]) On Mon, Jun 14, 2010 at 12:29 PM, n.via...@libero.it n.via...@libero.itwrote: Dear list, I have the following problem, what i'm trying to do is to built a function which does the following calculationg in a recursive way: I have a data frame more or less like this: variableyear DELTA EC01 2006/ EC01 2007 10 EC01 20085 EC01 20099 And then I have at time 2009 a variable called R_EC01(2009)=5 What I have to do is to construct the R_EC01 time series by starting from the 2009 value: R_EC01(2008)=R_EC01(2009)-DELTA(2009) R_EC01(2007)=R_EC01(2008)-DELTA(2008) R_EC01(2006)=R_EC01(2007)-DELTA(2007) In terms of number, the results that i should get are: R_EC01(2008)=5-9=-4 R_EC01(2007)=-4-5=-9 R_EC01(2006)=-9-10=-19 so my data frame should looks like this SERIESYEAR value R_EC01 2006 -19 R_EC012007 -9 R_EC012008 -4 R_EC01 2009 5 Anyone Knows hot to do it?? My dataframe is not set as a time series... Thanks a lot!!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] executable script
In Python, it is literally this easy: import rpy2.robjects as robjects robjects.r( source(C:/YOUR R FILE GOES HERE ) ) Type the name of your R source code into this script and save it as a Python script (add the suffix .py), and then you can run by double-clicking. If you want to see the results of your program, you'll have to have R print them in a file. Now, to get Python to run on your computer, you'll have to install the following: http://www.python.org/download/releases/2.6.5/ Python 2.6.5 http://sourceforge.net/projects/numpy/files/ NumPy 1.4.1 http://sourceforge.net/projects/rpy/files/rpy2/ Rpy2 2.0.8 (Note that these are the versions I currently have on my computer, there may be more recent ones out there.) You will also have to add Python to your computer's Path. To do this in Windows: 1. Right-click on My Computer, select Properties 2. On the Advanced tab, click Environmental Variables 3. In the System Variables list, select Path and click Edit 4. In the Variable Value line, enter C:\Python26\; at the beginning of the line (or C:\Python30\ if you choose to install Python 3.0) Now you can run Python scripts simply by double clicking them. This makes it very easy to turn R codes into an executable script; the only caveat is that you need to have Python installed on any computers that you want to run your script on. -- View this message in context: http://r.789695.n4.nabble.com/executable-script-tp839859p2254813.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How we can open the results are saved
On Mon, 14 Jun 2010, Patrick Burns wrote: On 14/06/2010 17:50, jim holtman wrote: load('adresse/filename.R') Or: attach('adresse/filename.R') The difference between 'load' and 'attach' is that 'load' puts the contents of the file into your workspace (global environment, first Not necessarily: see the envir argument. location on the search list), while 'attach' creates a new location on the search list. location = environement here. But calling this 'filename.R' is likely to lead to trouble. The '.R' suffix is usually used for files containing R commands that can be used by the 'source' function. The usual convention for files created by 'save' is to use a '.rda' suffix. Or .RData or .Rdata for those not DOS-centric: see the examples in the help file. Pat On Mon, Jun 14, 2010 at 12:41 PM,khaz...@ceremade.dauphine.fr wrote: Hi all I saved the result of my code as a file, like save(namefunction,file=adresse/filename.R). I want to open the filename. Could you please help me how I can open the filename and see the result. best Khazaei -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression with 50 varaibales
Thanks Charles for the reproducible codes. I started this question because I was asked to take a look at such dataset, but I have doubt if it's meaningful to do a LR with 50 variables. I haven't got the dataset yet, thus have not tried any code. But again for sharing some simulation code. have one question for your code. When you calculate common standard errors for coefficients using sd(coef(fit)), should you exlude intercept by doing sd(coef(fit)[-1])? Actually after removing intercept, the standard error calculate this way is very similar to one reported from colMeans(coef(summary(fit))), for both scenarios in your example (coef = 0.14 or 1.0) Another question is the 50 simulated variables have very low correlations (ranging from -0.1 to 0.08), that may contribute to the stable model. If some (not all) of the 50 variable have considerable correlation, like 0.7 or 0.8 correlation, How would LR model behave? Thanks John - Original Message From: Charles C. Berry cbe...@tajo.ucsd.edu To: Joris Meys jorism...@gmail.com Cc: r-help@r-project.org; Marc Schwartz marc_schwa...@me.com Sent: Mon, June 14, 2010 8:32:02 AM Subject: Re: [R] logistic regression with 50 varaibales On Mon, 14 Jun 2010, Joris Meys wrote: Hi, Marcs explanation is valid to a certain extent, but I don't agree with his conclusion. I'd like to point out the curse of dimensionality(Hughes effect) which starts to play rather quickly. Ahem! ... minimal, self-contained, reproducible code ... The OPs situation may not be an impossible one: set.seed(54321) dat - as.data.frame(matrix(rnorm(1800*50),nc=50)) dat$y - rbinom(1800,1,plogis(rowSums(dat)/7)) fit - glm(y~., dat, family=binomial) 1/7 # the true coef [1] 0.1428571 sd(coef(fit)) # roughly, the common standard error [1] 0.05605597 colMeans(coef(summary(fit))) # what glm() got Estimate Std. Errorz value Pr(|z|) 0.14590836 0.05380666 2.71067328 0.06354820 # a trickier case: set.seed(54321) dat - as.data.frame(matrix(rnorm(1800*50),nc=50)) dat$y - rbinom(1800,1,plogis(rowSums(dat))) # try again with coef==1 fit - glm(y~., dat, family=binomial) colMeans(coef(summary(fit))) Estimate Std. Error z valuePr(|z|) 0.982944012 0.119063631 8.222138491 0.008458002 Finer examination of the latter fit will show some values that differ too far from 1.0 to agree with the asymptotic std err. sd(coef(fit)) # rather bigger than 0.119 [1] 0.1827462 range(coef(fit)) [1] -0.08128586 1.25797057 And near separability may be playing here: cbind( + table( + cut( + plogis(abs(predict(fit))), + c( 0, 0.9, 0.99, 0.999, 0., 0.9, 1 ) ) ) ) [,1] (0,0.9] 453 (0.9,0.99]427 (0.99,0.999] 313 (0.999,0.]251 (0.,0.9] 173 (0.9,1] 183 Recall that the observed information contains a factor of plogis( predict(fit) )* plogis( -predict(fit)) hist(plogis( predict(fit) )* plogis( -predict(fit))) So the effective sample size here was much reduced. But to the OP's question, whether what you get is reasonable depends on what the setup is. I wouldn't call the first of the above cases 'highly unstable'. Which is not to say that one cannot generate difficult cases (esp. with correlated covariates and/or one or more highly influential covariates) and that the OPs case is not one of them. HTH, Chuck The curse of dimensionality is easily demonstrated looking at the proximity between your datapoints. Say we scale the interval in one dimension to be 1 unit. If you have 20 evenly-spaced observations, the distance between the observations is 0.05 units. To have a proximity like that in a 2-dimensional space, you need 20^2=400 observations. in a 10 dimensional space this becomes 20^10 ~ 10^13 datapoints. The distance between your observations is important, as a sparse dataset will definitely make your model misbehave. Even with about 35 samples per variable, using 50 independent variables will render a highly unstable model, as your dataspace is about as sparse as it can get. On top of that, interpreting a model with 50 variables is close to impossible, and then I didn't even start on interactions. No point in trying I'd say. If you really need all that information, you might want to take a look at some dimension reduction methods first. Cheers Joris On Mon, Jun 14, 2010 at 2:55 PM, Marc Schwartz marc_schwa...@me.com wrote: On Jun 13, 2010, at 10:20 PM, array chip wrote: Hi, this is not R technical question per se. I know there are many excellent statisticians in this list, so here my questions: I have dataset with ~1800 observations and 50 independent variables, so there are about 35 samples per variable. Is it wise to build a stable multiple logistic model with 50 independent variables? Any problem with this approach? Thanks John The general rule of thumb is to have 10-20 'events' per covariate degree of freedom. Frank has suggested that
[R] installing RExcel package
Was wondering if anyone has any experience installing the RExcel package by hand. I think I have all the files needed, but our firewall here prevents RExcelInstaller from going through the internet to get them like it wants to do, and it just gives up. Any ideas? Thanks. --Sam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] installing RExcel package
You can normally get through the firewall by using the internet2 option. Use ??internet for the exact function name. I am not at my computer now so I can't check for you. Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Zero counts lost with table() in functions
Hi, I am collecting replies from a survey and counts replies by the table() function. The function below carries two data frames and counts the observations of the findings in the first parameter vector given the value of the second as shown in the code below. My trouble is that the vector kp_vec at the end of the inner loop seems to ignore the value when the table() counts zero occurences of one of the outcomes. I will have y n 34 0 This is not picked up in the matrix row after the loops with something like y n y Funding Survival 12 5 34 where the last n value is missing. This causes my returned data frame to fail and in all, rather miserable for the plot. I see the point of this in a way, so I believe it is not a bug. I'd love to get my zero back. Is it a subtle point in R I have missed? kpi_test - function ( paramvec, kpivec ) { kp_vec - c() res_kpi_y - c() res_kpi_n - c() tmp_param - c() tmp_kpi - c() for(param_no in seq(from=1, to=length(paramvec), by = 1)) { tmp_param - paramvec[param_no] for (kpi_no in seq(from=1, to=length(kpivec), by = 1)) { tmp_kpi - kpivec[kpi_no] res_kpi_y - table( tmp_param [ tmp_kpi == 'y' ] ) res_kpi_n - table( tmp_param [ tmp_kpi == 'n' ] ) kp_vec - c(kp_vec, names(tmp_param), names(tmp_kpi), res_kpi_y, res_kpi_n ) } } matrix_vector - matrix(kp_vec, ncol=6, byrow=T) fres - data.frame(matrix_vector) return( fres ) } ottar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to make a barplot similar to Excel’s “clustered column chart”.
I have a matrix with 12 rows (one for each month), 2 columns (baseflow, runoff). I would like to make a barplot similar to Excelâs âclustered column chartâ. Here is my matrix âxâ 8.25875413.300710 10.180953 10.760465 11.012184 13.954887 10.910870 13.839839 9.02351911.511129 7.18924112.519830 5.92557617.101491 5.21161313.585175 5.03959213.506304 4.4623259.963006 5.58652111.306202 7.73924214.669374 If I use barplot(x, beside=T), column 1 appears on the left side of the plot, and then column 2 appears on the right side of the plot. I would rather that they alternate, so that for month 1, I see 2 bars â column 1, then column 2. And so forth for months 2-12. Is there a simple way to do this? (In excel this is the chart option for âclustered columnâ) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subtracting POSIXct data/times
I have two dataframe columns of POXIXct data/times that include seconds. I got them into this format using for example zsort$ETA - as.POSIXct(as.character(zsort$ETA), format=%m/%d/%Y %H:%M:%S) My problem is that when I subtract the two columns, sometimes the difference is given in seconds, and sometimes it is given in minutes. I don't care which it is, but I need to know which one I will get. DateTimeETA 2010-05-16 02:19:56 2010-05-16 03:46:35 ... Browse[1] mins = zsort$ETA - zsort$DateTime Browse[1] mins Time differences in hours [1] 1.444167 2.685000 3.077222 3.210278 3.248056 3.281944 3.281944 3.360278 3.360278 3.582778 4.57 5.506111 5.857778 6.150278 6.150278 6.243056 6.243889 6.248056 6.248611 6.248611 6.356667 attr(,tzone) But sometimes the answer is in seconds. # make a column with the minutes before landing zsort$MinBeforeLand = zsort$ETA - zsort$DateTime zsort$MinBeforeLand Time differences in secs [1] -50 136 221 878 1192 2263 3296 3959 4968 5846 8709 11537 12198 12442 12642 15952 18273 19952 20538 How do I specify the resultant units? Thanks, Jim Rome __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Zero counts lost with table() in functions
Ottar Kvindesland wrote: Hi, I am collecting replies from a survey and counts replies by the table() function. The function below carries two data frames and counts the observations of the findings in the first parameter vector given the value of the second as shown in the code below. My trouble is that the vector kp_vec at the end of the inner loop seems to ignore the value when the table() counts zero occurences of one of the outcomes. I will have y n 34 0 This is not picked up in the matrix row after the loops with something like y n y Funding Survival 12 5 34 where the last n value is missing. This causes my returned data frame to fail and in all, rather miserable for the plot. I see the point of this in a way, so I believe it is not a bug. I'd love to get my zero back. Is it a subtle point in R I have missed? I don't know about subtle, but the general idea is that if you can only expect to get a zero count if the set of possible values is known in advance to include that value. Otherwise there's just no end to the number of zeros to include! This can be obtained by making the object you tabulate into a factor with the relevant level set: table(y) y 1 table(factor(y,levels=c(y,n))) y n 1 0 Incidentally, this is the main reason R doesn't by default drop unused levels when subsetting. I can't make heads or tails of your code, but I hope the above helps solve your issue. -pd kpi_test - function ( paramvec, kpivec ) { kp_vec - c() res_kpi_y - c() res_kpi_n - c() tmp_param - c() tmp_kpi - c() for(param_no in seq(from=1, to=length(paramvec), by = 1)) { tmp_param - paramvec[param_no] for (kpi_no in seq(from=1, to=length(kpivec), by = 1)) { tmp_kpi - kpivec[kpi_no] res_kpi_y - table( tmp_param [ tmp_kpi == 'y' ] ) res_kpi_n - table( tmp_param [ tmp_kpi == 'n' ] ) kp_vec - c(kp_vec, names(tmp_param), names(tmp_kpi), res_kpi_y, res_kpi_n ) } } matrix_vector - matrix(kp_vec, ncol=6, byrow=T) fres - data.frame(matrix_vector) return( fres ) } ottar [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Zero counts lost with table() in functions
Ottar Kvindesland wrote: Hi, I am collecting replies from a survey and counts replies by the table() function. The function below carries two data frames and counts the observations of the findings in the first parameter vector given the value of the second as shown in the code below. My trouble is that the vector kp_vec at the end of the inner loop seems to ignore the value when the table() counts zero occurences of one of the outcomes. I will have y n 34 0 This is not picked up in the matrix row after the loops with something like y n y Funding Survival 12 5 34 where the last n value is missing. This causes my returned data frame to fail and in all, rather miserable for the plot. I see the point of this in a way, so I believe it is not a bug. I'd love to get my zero back. Is it a subtle point in R I have missed? kpi_test - function ( paramvec, kpivec ) { kp_vec - c() res_kpi_y - c() res_kpi_n - c() tmp_param - c() tmp_kpi - c() for(param_no in seq(from=1, to=length(paramvec), by = 1)) { tmp_param - paramvec[param_no] for (kpi_no in seq(from=1, to=length(kpivec), by = 1)) { tmp_kpi - kpivec[kpi_no] res_kpi_y - table( tmp_param [ tmp_kpi == 'y' ] ) res_kpi_n - table( tmp_param [ tmp_kpi == 'n' ] ) kp_vec - c(kp_vec, names(tmp_param), names(tmp_kpi), res_kpi_y, res_kpi_n ) } } matrix_vector - matrix(kp_vec, ncol=6, byrow=T) fres - data.frame(matrix_vector) return( fres ) } Is it possible to provide test data that shows how your function is not behaving as you'd like? It's probable that your function can be reduced to a line or two of R code, while solving your problem, and become much more readable if we know what it's supposed to do. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make a barplot similar to Excel’s “clustered column chart”.
josef.kar...@phila.gov wrote: I have a matrix with 12 rows (one for each month), 2 columns (baseflow, runoff). I would like to make a barplot similar to Excel’s “clustered column chart�. Here is my matrix ‘x’ 8.25875413.300710 10.180953 10.760465 11.012184 13.954887 10.910870 13.839839 9.02351911.511129 7.18924112.519830 5.92557617.101491 5.21161313.585175 5.03959213.506304 4.4623259.963006 5.58652111.306202 7.73924214.669374 If I use barplot(x, beside=T), column 1 appears on the left side of the plot, and then column 2 appears on the right side of the plot. I would rather that they alternate, so that for month 1, I see 2 bars – column 1, then column 2. And so forth for months 2-12. Is there a simple way to do this? (In excel this is the chart option for “clustered column�) Can't you just transpose x? barplot(t(x), beside=T) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Times series data file?
Hello, I currently splitting a file into individual files (time series each separated into one file), the file I read in skips the first four lines and extracts the data columns I need. I was wondering if there is a way for R to automatically scan and separate the files based on the head information? The header delineates the observation with a 254 space time space day space month space year. 254 0 26 NOV1995 I would like to then use the program I have written to do the analysis. I have attached a small subset text file of data (two observations). Any thoughts would be very helpful. # Read in data file data - read.table(file.txt, skip=4, header=F) temp - data$V4 elev - data$V3 Thank you, Doug -- - Douglas M. Hultstrand, MS Senior Hydrometeorologist Metstat, Inc. Windsor, Colorado voice: 720.771.5840 email: dmhul...@metstat.com web: http://www.metstat.com - 254 0 26 NOV1995 1 24232 72694 44.92N123.02W61 2302 2100 1960 2680149 9 4 3 SLE9 kt 9 10071 61106 94180 10 5 10037 89110 82183 12 4 1120106 84186 12 6 9780304 9 9205 19 6 9500544 74 59221 19 254 12 26 NOV1995 1 24232 72694 44.92N123.02W61 2302 2100 1960 2680149 9 4 3 SLE9 kt 9 10071 61106 94180 10 5 10037 89110 82183 12 4 1120106 84186 12 6 9780304 9 9205 19 6 9500544 74 59221 19 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to make a barplot similar to Excel’s “clustered column chart”.
Josef, I think all you need to do is use the transpose of your data matrix. So if your dataset is called mydata: barplot(t(as.matrix(x)),beside=T) -- View this message in context: http://r.789695.n4.nabble.com/how-to-make-a-barplot-similar-to-Excel-s-clustered-column-chart-tp2254979p2255008.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 3D ncdf file
Hello, I have an ncdf file with different variables for dimensions and dates where the dimensions are as follows, where X=i and Y=j creating a 88 by 188 set of cells. For each cell there are 12 readings for DO taken at 2 hour intervals and recoded by according to the Julian calendar under the ordinal dates variable. DO: [2280x188x85] DX: [188x85] DY: [188x85] X: [85x1] Y: [188x1] Ordinal_Dates: [2280x1] T: [2280x1] So far, I have set up each variable as follows, and I don?t know how to work with the 3D DO variable which contains 12 daily values per day setwd (C:/Documents and Settings/Desktop/R) library(ncdf) profile1 - open.ncdf(2005r1_v3_Ct=500_lerie_out_bot.nc, header=FALSE) name1-get.var.ncdf( profile1, DO) name2-get.var.ncdf( profile1, DX) name3-get.var.ncdf( profile1, DY) name4-get.var.ncdf( profile1, X) name5-get.var.ncdf( profile1, Y) name6-get.var.ncdf( profile1, Ordinal_Dates) name7-get.var.ncdf( profile1, T) I want to set a threshold, and calculate how many days have an average of DO greater then my threshold, and also what the daily averages of DO are Any help is appreciated! Emilija __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NPMC
HI, I am a new user of R and want to analyse some data using npmc. My data have several levels of factor (Site, Year and Season) and several variable (Percentages). I have tried to use npmc but I always get an error message. My data are in a table following this example: SiteYEarSeasonVar1 Var2 A 2009Dry 1056 B here is the error message I get: Erreur dans as.vector(x, mode) : argument 'mode' incorrect. How should I use the npmc formula? Thank you for your help -- View this message in context: http://r.789695.n4.nabble.com/NPMC-tp2254913p2254913.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subtracting POSIXct data/times
See the help page for the difftime() function, which will tell you how to specify the units of the differences. (when you don't specify, it chooses the units according to some rules) -Don At 4:24 PM -0400 6/14/10, James Rome wrote: I have two dataframe columns of POXIXct data/times that include seconds. I got them into this format using for example zsort$ETA - as.POSIXct(as.character(zsort$ETA), format=%m/%d/%Y %H:%M:%S) My problem is that when I subtract the two columns, sometimes the difference is given in seconds, and sometimes it is given in minutes. I don't care which it is, but I need to know which one I will get. DateTimeETA 2010-05-16 02:19:56 2010-05-16 03:46:35 ... Browse[1] mins = zsort$ETA - zsort$DateTime Browse[1] mins Time differences in hours [1] 1.444167 2.685000 3.077222 3.210278 3.248056 3.281944 3.281944 3.360278 3.360278 3.582778 4.57 5.506111 5.857778 6.150278 6.150278 6.243056 6.243889 6.248056 6.248611 6.248611 6.356667 attr(,tzone) But sometimes the answer is in seconds. # make a column with the minutes before landing zsort$MinBeforeLand = zsort$ETA - zsort$DateTime zsort$MinBeforeLand Time differences in secs [1] -50 136 221 878 1192 2263 3296 3959 4968 5846 8709 11537 12198 12442 12642 15952 18273 19952 20538 How do I specify the resultant units? Thanks, Jim Rome __ R-help@r-project.org mailing list https://*stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.