[R] 5 remaining places on stats course in Edmonton
There are 5 remaining places on the following course: Data exploration, regression, GLM GAM. With introduction to R When: 26 - 30 August 2013. Where: Edmonton, Canada For details, see: http://www.highstat.com/statscourse.htm Course flyer: http://www.highstat.com/Courses/Flyer2013_09Canada.pdf Kind regards, Alain Zuur -- Dr. Alain F. Zuur First author of: 1. Analysing Ecological Data (2007). Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p. URL: www.springer.com/0-387-45967-7 2. Mixed effects models and extensions in ecology with R. (2009). Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer. http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9 3. A Beginner's Guide to R (2009). Zuur, AF, Ieno, EN, Meesters, EHWG. Springer http://www.springer.com/statistics/computational/book/978-0-387-93836-3 4. Zero Inflated Models and Generalized Linear Mixed Models with R. (2012) Zuur, Saveliev, Ieno. http://www.highstat.com/book4.htm Other books: http://www.highstat.com/books.htm Statistical consultancy, courses, data analysis and software Highland Statistics Ltd. 6 Laverock road UK - AB41 6FN Newburgh Tel: 0044 1358 788177 Email: highs...@highstat.com URL: www.highstat.com URL: www.brodgar.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sample size determination - comparing means for different size samples
Hello, At an R prompt, type ?power.t.test Hope this helps, Rui Barradas Em 06-06-2013 20:58, Rebecca Greenblatt escreveu: Looking to determine sample sizes for both my experimental and control groups (I want only a small portion of my participants in my experimental condition) in order to compare population means. I would be able to estimate standard deviation beforehand. I'm using the bpower function from the Hmisc R package, specifically bsamsize to compare population proportions, but was wondering if there was an equivalent function for comparing means. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identifying breakpoints/inflection points?
You can try this: library(inflection) #you have to instsall package inflection first a-findiplist(cbind(year),cbind(piproute),1) a The answer: [,1] [,2] [,3] [1,]5 35 1986.0 [2,]5 30 1983.5 shows that the total inflection point is between 1983 and 1986, if we treat data as first concave and then convex, as it can be found from a simple graph. -- View this message in context: http://r.789695.n4.nabble.com/Identifying-breakpoints-inflection-points-tp2065886p4668889.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Function nlme::lme in Ubuntu (but not Win or OS X): Non-positive definite approximate variance-covariance
Dear all, I am estimating a mixed-model in Ubuntu Raring (13.04¸ amd64), with the code: fm0 - lme(rt ~ run + group * stim * cond, random=list( subj=pdSymm(~ 1 + run), subj=pdSymm(~ 0 + stim)), data=mydat1) When I check the approximate variance-covariance matrix, I get: fm0$apVar [1] Non-positive definite approximate variance-covariance *However*, if I do the same on a Windows or a Mac OS X machine, I get: fm0$apVar reStruct.subj1 reStruct.subj2 reStruct.subj3 reStruct.subj1 reStruct.subj1 1.952757e-01 3.130089e-01 5.766955e-01 -0.1034862377 reStruct.subj2 3.130089e-01 5.703211e-01 9.864047e-01 -0.1937902915 reStruct.subj3 5.766955e-01 9.864047e-01 1.861434e+00 -0.3303689957 reStruct.subj1 -1.034862e-01 -1.937903e-01 -3.303690e-01 0.0941652375 reStruct.subj2 -3.464557e-03 -5.933008e-03 -1.102904e-02 0.0163847582 reStruct.subj3 -6.826057e-01 -1.186231e+00 -2.122872e+00 0.3705613715 lSigma -2.058254e-06 -4.764783e-06 1.653583e-05 -0.146693 reStruct.subj2 reStruct.subj3lSigma reStruct.subj1 -3.464557e-03 -6.826057e-01 -2.058254e-06 reStruct.subj2 -5.933008e-03 -1.186231e+00 -4.764783e-06 reStruct.subj3 -1.102904e-02 -2.122872e+00 1.653583e-05 reStruct.subj1 1.638476e-02 3.705614e-01 -1.466930e-05 reStruct.subj2 1.931738e-02 -2.020354e-02 -1.925015e-05 reStruct.subj3 -2.020354e-02 2.825301e+00 2.967063e-05 lSigma -1.925015e-05 2.967063e-05 6.299813e-05 attr(,Pars) reStruct.subj1 reStruct.subj2 reStruct.subj3 reStruct.subj1 reStruct.subj2 4.570267 4.299735 1.282732 4.859346 3.206102 reStruct.subj3 lSigma -2.631377 5.308304 attr(,natural) [1] TRUE I am unable to figure out the reason why I get such a difference in the output: all the machines are running the latest version of R (3.0.1) and nlme package (3.1-109) as of today (updated from the CRAN master site). Many thanks in advance for any suggestions, best giuseppe -- Giuseppe Pagnoni Dip. Scienze Biomediche, Metaboliche e Neuroscienze Sezione Fisiologia e Neuroscienze Univ. di Modena e Reggio Emilia Via Campi 287 I-41125 Modena, Italy Tel: +39-059-205-5742 Fax: +39-059-205-5363 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] boot, what am I doing wrong?
I am getting started with the boot package and boot command. As a first step I tried the following. Something is wrong, but i can't see what. Any advice would be much appreciated. x = runif(10) mean(x) [1] 0.467626212374307 boot(x, mean, R=100) Error in mean.default(data, original, ...) : 'trim' must be numeric of length one Enter a frame number, or 0 to exit 1: boot(x, mean, R = 100) 2: statistic(data, original, ...) 3: mean.default(data, original, ...) 4: stop('trim' must be numeric of length one) 5: (function () { utils::recover() })() When I enter a trim parameter I get the following: boot(x, mean, R=100, trim=0) [...some output omitted...] Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = x, statistic = mean, R = 100, trim = 0) Bootstrap Statistics : original biasstd. error t1* 0.467626212374307 0 0 When I enter an na.rm value as well the following is output: boot(x, mean, R=100, trim=0, na.rm=T) ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = x, statistic = mean, R = 100, trim = 0, na.rm = T) Bootstrap Statistics : original biasstd. error t1* 0.467626212374307 0 0 Notice that the standard error is 0, indicating that no bootstrapping has actually taken place. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help on loops
Dear R users, I am stuck here: My first function returns a vector of 5 values. In my second function, I want to repeat this, a number of times, say 10 so that I have 10 rows and five columns but I keep on getting errors. See the code and results below: optm -function(perm, verbose = FALSE) { trace-c() for (k in 1:perm){ trace[k]-Rspatswap(rhox=0.6,rhoy=0.6,sigmasqG=0.081,SsqR=1)[1] perm[k]-k mat-cbind(trace, perm = seq(perm)) } if (verbose){ cat(***starting matrix\n) print(mat) } # iterate till done while(nrow(mat) 1){ high - diff(mat[, 'trace']) 0 if (!any(high)) break # done # find which one to delete delete - which.max(high) + 1L mat - mat[-delete, ] newmat-apply(mat,2,mean)[1] sdm-sd(mat[,1]) sem-sdm/sqrt(nrow(mat)) maxv-mat[1,1] minv-mat[nrow(mat),1] } stats-cbind(average=newmat,sd=sdm,se=sem,min=minv,max=maxv) stats } test-optm(perm=20) test average sd se min max trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241 itn-function(it){ siml-matrix(NA,ncol=5,nrow=length(it)) for(g in 1:it){ siml[g]-optm(perm=20) } siml-cbind(siml=siml) siml } ans-itn(5) Warning messages: 1: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 2: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 3: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 4: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 5: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length ans [,1] [,2] [,3] [,4] [,5] [1,] 0.8874234 0.8861666 0.8880521 0.8870958 0.8876469 On 5/31/2013 9:53 PM, jim holtman wrote: Is this what you want? I was not clear on your algorithm, but is looks like you wanted descending values: testx - + function(n, verbose = FALSE) + { + mat - cbind(optA = sample(n, n, TRUE), perm = seq(n)) + if (verbose){ + cat(***starting matrix\n) + print(mat) + } + # iterate till done + while(nrow(mat) 1){ + high - diff(mat[, 'optA']) 0 + if (!any(high)) break # done + # find which one to delete + delete - which.max(high) + 1L + mat - mat[-delete, ] + } + mat + } testx(20, verbose = TRUE) ***starting matrix optA perm [1,] 131 [2,] 122 [3,]73 [4,] 104 [5,] 115 [6,]46 [7,] 117 [8,]28 [9,]69 [10,]5 10 [11,]6 11 [12,] 18 12 [13,]9 13 [14,] 16 14 [15,] 18 15 [16,]9 16 [17,]2 17 [18,]7 18 [19,] 15 19 [20,]7 20 optA perm [1,] 131 [2,] 122 [3,]73 [4,]46 [5,]28 [6,]2 17 On Fri, May 31, 2013 at 2:02 PM, Laz lmra...@ufl.edu mailto:lmra...@ufl.edu wrote: Dear R Users, I created a function which returns a value every time it's run (A simplified toy example is attached on this mail). For example spat(a,b,c,d) # run the first time and it gives you ans1= 10, perm=1 ; run the second time and gives you ans2= 7, perm=2 etc I store both the result and the iteration on a matrix called vector with columns:1==ans, column2==permutation The rule I want to implement is: compare between ans1 and ans2. If ans1ans2, keep both ans1 and ans2. if ans1ans2, drop the whole row of the second permutation (that is drop both ans2 and perm2 but continue counting all permutations). Re-run the function for the 3rd time and repeat comparison between the value of the last run and the current value obtained. Return matrix ans with the saved ans and their permutations and another full matrix with all results including the dropped ans and their permutation numbers. I need to repeat this process 1000 times but I am getting stuck below. See attached R code. The code below works only for the first 6 permutations. suppose I want 1000 permutations, where do I keep the loop? Example: Not a perfect code though it somehow works: testx-function(perm){ optA-c() #while(perm=2){ for (k in 1:perm){ #repeat { optA[k]-sample(1:1000,1,replace=TRUE) perm[k]-k } mat2-as.matrix(cbind(optA=optA,perm=perm)) result-mat2 lenm-nrow(result) if(result[1,1]=result[2,1]) result-result[1,] return(list(mat2=mat2,result=result)) #} if(perm2){ mat2-as.matrix(cbind(optA=optA,perm=perm))
Re: [R] boot, what am I doing wrong?
Hi there, You need a function for your statistic: boot(x, function(x, index) mean(x[index]), R = 1000) ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = x, statistic = function(x, index) mean(x[index]), R = 1000) Bootstrap Statistics : original biasstd. error t1* 0.4069613 0.005465687 0.07355997 See http://www.mayin.org/ajayshah/KB/R/documents/boot.html for more information and examples. HTH, Jorge.- On Fri, Jun 7, 2013 at 6:03 PM, Rguy r...@123mail.org wrote: I am getting started with the boot package and boot command. As a first step I tried the following. Something is wrong, but i can't see what. Any advice would be much appreciated. x = runif(10) mean(x) [1] 0.467626212374307 boot(x, mean, R=100) Error in mean.default(data, original, ...) : 'trim' must be numeric of length one Enter a frame number, or 0 to exit 1: boot(x, mean, R = 100) 2: statistic(data, original, ...) 3: mean.default(data, original, ...) 4: stop('trim' must be numeric of length one) 5: (function () { utils::recover() })() When I enter a trim parameter I get the following: boot(x, mean, R=100, trim=0) [...some output omitted...] Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used Warning in if (na.rm) x - x[!is.na(x)] : the condition has length 1 and only the first element will be used ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = x, statistic = mean, R = 100, trim = 0) Bootstrap Statistics : original biasstd. error t1* 0.467626212374307 0 0 When I enter an na.rm value as well the following is output: boot(x, mean, R=100, trim=0, na.rm=T) ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = x, statistic = mean, R = 100, trim = 0, na.rm = T) Bootstrap Statistics : original biasstd. error t1* 0.467626212374307 0 0 Notice that the standard error is 0, indicating that no bootstrapping has actually taken place. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help on loops
On 07-06-2013, at 10:59, Laz lmra...@ufl.edu wrote: Dear R users, I am stuck here: My first function returns a vector of 5 values. In my second function, I want to repeat this, a number of times, say 10 so that I have 10 rows and five columns but I keep on getting errors. See the code and results below: optm -function(perm, verbose = FALSE) { trace-c() for (k in 1:perm){ trace[k]-Rspatswap(rhox=0.6,rhoy=0.6,sigmasqG=0.081,SsqR=1)[1] perm[k]-k mat-cbind(trace, perm = seq(perm)) } if (verbose){ cat(***starting matrix\n) print(mat) } # iterate till done while(nrow(mat) 1){ high - diff(mat[, 'trace']) 0 if (!any(high)) break # done # find which one to delete delete - which.max(high) + 1L mat - mat[-delete, ] newmat-apply(mat,2,mean)[1] sdm-sd(mat[,1]) sem-sdm/sqrt(nrow(mat)) maxv-mat[1,1] minv-mat[nrow(mat),1] } stats-cbind(average=newmat,sd=sdm,se=sem,min=minv,max=maxv) stats } test-optm(perm=20) test average sd se min max trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241 itn-function(it){ siml-matrix(NA,ncol=5,nrow=length(it)) for(g in 1:it){ siml[g]-optm(perm=20) } siml-cbind(siml=siml) siml } ans-itn(5) Warning messages: 1: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 2: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 3: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 4: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 5: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length ans [,1] [,2] [,3] [,4] [,5] [1,] 0.8874234 0.8861666 0.8880521 0.8870958 0.8876469 1. Not reproducible code. Where does function Rspatswap come from? 2. You have several errors in function itn: Argument it is a scalar: length(it) is 1. You need to do siml - matrix(NA,ncol=5,nrow=it) Next in the g-loop you want to fill row g so do: siml[g,] - ….. Finally why are you doing siml - cbind(siml=siml)? Seems superfluous to me. Delete the line. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matched samples, dataframe, panel data
I R-helpers #I have a data panel of thousands of firms, by year and industry and #one dummy variable that separates the firms in two categories: 1 if the firm have an auditor; 0 if not #and another variable the represents the firm dimension (total assets in thousand of euros) #I need to create two separated samples with the same number os firms where #one firm in the first have a corresponding firm in the second with the same #year, industry and dimension (the dimension doesn't need to be exactly the #same, it could vary in an interval of +/- 10%, for example) #My reproducible example firm1-sort(rep(1:10,5),decreasing=F) year1-rep(2000:2004,10) industry1-rep(20,50) dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1) dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789, 2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824, 23,2897,3456,7690,6022,3678,9431,2890) data1-data.frame(firm1,year1,industry1,dummy1,dimension1) data1 colnames(data1)-c(firm,year,industry,dummy,dimension) firm2-sort(rep(11:15,3),decreasing=F) year2-rep(2001:2003,5) industry2-rep(30,15) dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1) dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345) data2-data.frame(firm2,year2,industry2,dummy2,dimension2) data2 colnames(data2)-c(firm,year,industry,dummy,dimension) firm3-sort(rep(16:20,4),decreasing=F) year3-rep(2001:2004,5) industry3-rep(40,20) dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0) dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824, 23,2897) data3-data.frame(firm3,year3,industry3,dummy3,dimension3) data3 colnames(data3)-c(firm,year,industry,dummy,dimension) final1-rbind(data1,data2) final2-rbind(final1,data3) final2 final3-final2[order(final2$year,final2$industry,final2$dimension),] final3 Thank you very much, Cecília Carmo Universidade de Aveiro - Portugal [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help on loops
Dear Berend, For reproducibility, Rspatswap() is a function which originally returns a single value. For example Rspatswap(...) and you get 0.8 So, run Rspatswap() 20 times and store all the values. Then from these 20 values, calculate the calculate average, sd,se,min,max to get something similar to: average sd se min max trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241 If we repeat the function 10 times, then I expect 10 rows with 5 columns but it does not work ? I hope I am now clearer... On 6/7/2013 5:24 AM, Berend Hasselman wrote: On 07-06-2013, at 10:59, Laz lmra...@ufl.edu wrote: Dear R users, I am stuck here: My first function returns a vector of 5 values. In my second function, I want to repeat this, a number of times, say 10 so that I have 10 rows and five columns but I keep on getting errors. See the code and results below: optm -function(perm, verbose = FALSE) { trace-c() for (k in 1:perm){ trace[k]-Rspatswap(rhox=0.6,rhoy=0.6,sigmasqG=0.081,SsqR=1)[1] perm[k]-k mat-cbind(trace, perm = seq(perm)) } if (verbose){ cat(***starting matrix\n) print(mat) } # iterate till done while(nrow(mat) 1){ high - diff(mat[, 'trace']) 0 if (!any(high)) break # done # find which one to delete delete - which.max(high) + 1L mat - mat[-delete, ] newmat-apply(mat,2,mean)[1] sdm-sd(mat[,1]) sem-sdm/sqrt(nrow(mat)) maxv-mat[1,1] minv-mat[nrow(mat),1] } stats-cbind(average=newmat,sd=sdm,se=sem,min=minv,max=maxv) stats } test-optm(perm=20) test average sd se min max trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241 itn-function(it){ siml-matrix(NA,ncol=5,nrow=length(it)) for(g in 1:it){ siml[g]-optm(perm=20) } siml-cbind(siml=siml) siml } ans-itn(5) Warning messages: 1: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 2: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 3: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 4: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 5: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length ans [,1] [,2] [,3] [,4] [,5] [1,] 0.8874234 0.8861666 0.8880521 0.8870958 0.8876469 1. Not reproducible code. Where does function Rspatswap come from? 2. You have several errors in function itn: Argument it is a scalar: length(it) is 1. You need to do siml - matrix(NA,ncol=5,nrow=it) Next in the g-loop you want to fill row g so do: siml[g,] - ….. Finally why are you doing siml - cbind(siml=siml)? Seems superfluous to me. Delete the line. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional coloring of area between curves
Hi Duncan, that's exactly what I was looking for! The series have common times, so that's no problem. Thanks a lot! Roland Von: Duncan Murdoch [murdoch.dun...@gmail.com] Gesendet: Donnerstag, 6. Juni 2013 18:45 An: Roland Pape Cc: r-help@r-project.org Betreff: Re: [R] Conditional coloring of area between curves On 06/06/2013 10:41 AM, Roland Pape wrote: Dear list, I have two time series of temperatures from different sites plotted in the same diagram and would like to color the area between the curves differently, dependent on whether site 1 is cooler than site 2 (colored let's say in blue) or warmer (red). While polygone() works fine to color the enclosed area in general, I'm struggling with this conditional coloring. Any help is greatly appreciated! Suppose the series are named site1 and site2, and they have common times in a variable times. Then the following should do what you want: smaller - pmin(site1, site2) plot(x, site1, ylim = range(c(site1, site2)), type=n) polygon(c(x, rev(x)), c(smaller, rev(site1)), col=red) polygon(c(x, rev(x)), c(smaller, rev(site2)), col=blue) If the times for the two series are different it's a little harder; first you need to give them common times, and that will depend on how you decide to evaluate the values between observations. Probably linear interpolation (using approx()) is fine, but it's up to you. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Distance-based non-parametric ANOVA
Hi Mikhail, After a cursory online search using simply A new method for non-parametric multivariate analysis + CRAN + R, I got the following three packages which include this paper in their references: vegan, mefa, and TraMineR. I don't know whether they deal with the procedure you are after, but it would be worth having a look. Regards, José Prof. José Iparraguirre Chief Economist Age UK Age UK Tavis House, 1- 6 Tavistock Square London, WC1H 9NB -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mikhail Umorin Sent: 06 June 2013 21:35 To: r-help@r-project.org Subject: [R] Distance-based non-parametric ANOVA Hello, I am comparing treatments by comparing within group to between group distances like described in MJ Anderson. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology 26: 32 -- 46. The idea is to find the ratio of within group sum of distance^2 to between group sum of distance^2. Then find the distribution of the statistic by permuting the sample. Has anyone done this before in R code? Are there any packages (I did checked on CRAN - none). Code? I want to make sure I won't be inventing a bicycle before I write my own package. Comments? Suggestions? (I have never written an R package before) Thank you for your time, Mikhail. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Age UK Improving later life www.ageuk.org.uk --- Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798). Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited, Age UK is an Introducer Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth Access for the purposes of introducing potential annuity and health cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit Solutions Limited and Simplyhealth Access are all authorised and regulated by the Financial Services Authority. -- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you receive a message in error, please advise the sender and delete immediately. Except where this email is sent in the usual course of our business, any opinions expressed in this email are those of the author and do not necessarily reflect the opinions of Age UK or its subsidiaries and associated companies. Age UK monitors all e-mail transmissions passing through its network and may block or modify mails which are deemed to be unsuitable. Age Concern England (charity number 261794) and Help the Aged (charity number 272786) and their trading and other associated companies merged on 1st April 2009. Together they have formed the Age UK Group, dedicated to improving the lives of people in later life. The three national Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help the Aged in these nations to form three registered charities: Age Scotland, Age NI, Age Cymru. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help on loops
Hi, I am almost getting there, but still have errors. Thanks for your help. I have tried improving but I get the following errors below: itn-function(it){ +siml-matrix(NA,ncol=5,nrow=it) +for(g in 1:it){ +siml[g]-optm(perm=20)[g] +} +siml +} itn(3) [,1] [,2] [,3] [,4] [,5] [1,] 0.8873775898 NA NA NA NA [2,] 0.0015584824 NA NA NA NA [3,] 0.0001414317 NA NA NA NA itn-function(it){ +siml-matrix(NA,ncol=5,nrow=it) +for(g in 1:it){ +siml[g]-optm(perm=20) +} +siml +} itn(3) [,1] [,2] [,3] [,4] [,5] [1,] 0.8880941 NA NA NA NA [2,] 0.8869727 NA NA NA NA [3,] 0.8877045 NA NA NA NA Warning messages: 1: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 2: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 3: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length I expect something close to average sd se min max 0.8881969 0.0008215379 0.000410769 0.8873842 0.8890167 0.884659 0.0004215379 0.000410769 0.2342 0.676307 0.8885839 0.0001215379 0.0002112 0.82752992 0.8836337 Thanks fpr you help. On 6/7/2013 5:24 AM, Berend Hasselman wrote: On 07-06-2013, at 10:59, Laz lmra...@ufl.edu wrote: Dear R users, I am stuck here: My first function returns a vector of 5 values. In my second function, I want to repeat this, a number of times, say 10 so that I have 10 rows and five columns but I keep on getting errors. See the code and results below: optm -function(perm, verbose = FALSE) { trace-c() for (k in 1:perm){ trace[k]-Rspatswap(rhox=0.6,rhoy=0.6,sigmasqG=0.081,SsqR=1)[1] perm[k]-k mat-cbind(trace, perm = seq(perm)) } if (verbose){ cat(***starting matrix\n) print(mat) } # iterate till done while(nrow(mat) 1){ high - diff(mat[, 'trace']) 0 if (!any(high)) break # done # find which one to delete delete - which.max(high) + 1L mat - mat[-delete, ] newmat-apply(mat,2,mean)[1] sdm-sd(mat[,1]) sem-sdm/sqrt(nrow(mat)) maxv-mat[1,1] minv-mat[nrow(mat),1] } stats-cbind(average=newmat,sd=sdm,se=sem,min=minv,max=maxv) stats } test-optm(perm=20) test average sd se min max trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241 itn-function(it){ siml-matrix(NA,ncol=5,nrow=length(it)) for(g in 1:it){ siml[g]-optm(perm=20) } siml-cbind(siml=siml) siml } ans-itn(5) Warning messages: 1: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 2: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 3: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 4: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 5: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length ans [,1] [,2] [,3] [,4] [,5] [1,] 0.8874234 0.8861666 0.8880521 0.8870958 0.8876469 1. Not reproducible code. Where does function Rspatswap come from? 2. You have several errors in function itn: Argument it is a scalar: length(it) is 1. You need to do siml - matrix(NA,ncol=5,nrow=it) Next in the g-loop you want to fill row g so do: siml[g,] - .. Finally why are you doing siml - cbind(siml=siml)? Seems superfluous to me. Delete the line. Berend [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Distance-based non-parametric ANOVA
Hello, You also can use the package sos findFn('non-parametric multivariate ANOVA') Regards, Pascal 2013/6/7 Mikhail Umorin mike...@gmail.com Hello, I am comparing treatments by comparing within group to between group distances like described in MJ Anderson. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology 26: 32 -- 46. The idea is to find the ratio of within group sum of distance^2 to between group sum of distance^2. Then find the distribution of the statistic by permuting the sample. Has anyone done this before in R code? Are there any packages (I did checked on CRAN - none). Code? I want to make sure I won't be inventing a bicycle before I write my own package. Comments? Suggestions? (I have never written an R package before) Thank you for your time, Mikhail. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help on loops
On 07-06-2013, at 11:57, Laz lmra...@ufl.edu wrote: Dear Berend, I have made some changes but I still get errors: For reproducibility, Rspatswap() is a function which originally returns a single value. For example Rspatswap(...) and you get 0.8 So, run Rspatswap() 20 times and store all the values. Then from these 20 values, calculate the calculate average, sd,se,min,max to get something similar to: average sd se min max trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241 If we repeat the function 10 times, then I expect 10 rows with 5 columns but it does not work ? Replacing Rspatswap(…)[1] with runif(1) (you could/should have done that to provide reproducible code) and the modifications I suggested you make in your function itn() like this itn-function(it){ siml-matrix(NA,ncol=5,nrow=it) for(g in 1:it){ siml[g,]-optm(perm=20) } siml } running itn(10) I get a matrix with 10 rows and 5 columns. So what do you mean by it does not work? You sometimes get an error message from apply due to a bad dim() like this Error in apply(mat, 2, mean) : dim(X) must have a positive length When the result of mat[-delete,] is a matrix with a single row the result is simplified to a vector. See ?[ in particular the drop argument. So this should work in function optm() mat - mat[-delete,, drop=FALSE] Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help on loops
Please, please do not use html formatted mail. It is clearly requested by the mailing list The code of your last mail is a mess and when replying it becomes even more of a mess. I told you to do siml[g,] - optm(perm=20) See the comma after g. and not what you are now doing with siml[g]-optm(perm=20)[g] I'm giving up. Berend On 07-06-2013, at 12:15, Laz lmra...@ufl.edu wrote: Hi, I am almost getting there, but still have errors. Thanks for your help. I have tried improving but I get the following errors below: itn-function(it){ + siml-matrix(NA,ncol=5,nrow=it) + for(g in 1:it){ + siml[g]-optm(perm=20)[g] + } + siml + } itn(3) [,1] [,2] [,3] [,4] [,5] [1,] 0.8873775898 NA NA NA NA [2,] 0.0015584824 NA NA NA NA [3,] 0.0001414317 NA NA NA NA itn-function(it){ + siml-matrix(NA,ncol=5,nrow=it) + for(g in 1:it){ + siml[g]-optm(perm=20) + } + siml + } itn(3) [,1] [,2] [,3] [,4] [,5] [1,] 0.8880941 NA NA NA NA [2,] 0.8869727 NA NA NA NA [3,] 0.8877045 NA NA NA NA Warning messages: 1: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 2: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 3: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length I expect something close to average sd se min max 0.8881969 0.0008215379 0.000410769 0.8873842 0.8890167 0.884659 0.0004215379 0.000410769 0.2342 0.676307 0.8885839 0.0001215379 0.0002112 0.82752992 0.8836337 Thanks fpr you help. On 6/7/2013 5:24 AM, Berend Hasselman wrote: On 07-06-2013, at 10:59, Laz lmra...@ufl.edu wrote: Dear R users, I am stuck here: My first function returns a vector of 5 values. In my second function, I want to repeat this, a number of times, say 10 so that I have 10 rows and five columns but I keep on getting errors. See the code and results below: optm -function(perm, verbose = FALSE) { trace-c() for (k in 1:perm){ trace[k]-Rspatswap(rhox=0.6,rhoy=0.6,sigmasqG=0.081,SsqR=1)[1] perm[k]-k mat-cbind(trace, perm = seq(perm)) } if (verbose){ cat(***starting matrix\n) print(mat) } # iterate till done while(nrow(mat) 1){ high - diff(mat[, 'trace']) 0 if (!any(high)) break # done # find which one to delete delete - which.max(high) + 1L mat - mat[-delete, ] newmat-apply(mat,2,mean)[1] sdm-sd(mat[,1]) sem-sdm/sqrt(nrow(mat)) maxv-mat[1,1] minv-mat[nrow(mat),1] } stats-cbind(average=newmat,sd=sdm,se=sem,min=minv,max=maxv) stats } test-optm(perm=20) test average sd se min max trace 0.8880286 0.0009178193 0.0004589096 0.8870152 0.889241 itn-function(it){ siml-matrix(NA,ncol=5,nrow=length(it)) for(g in 1:it){ siml[g]-optm(perm=20) } siml-cbind(siml=siml) siml } ans-itn(5) Warning messages: 1: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 2: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 3: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 4: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length 5: In siml[g] - optm(perm = 20) : number of items to replace is not a multiple of replacement length ans [,1] [,2] [,3] [,4] [,5] [1,] 0.8874234 0.8861666 0.8880521 0.8870958 0.8876469 1. Not reproducible code. Where does function Rspatswap come from? 2. You have several errors in function itn: Argument it is a scalar: length(it) is 1. You need to do siml - matrix(NA,ncol=5,nrow=it) Next in the g-loop you want to fill row g so do: siml[g,] - ….. Finally why are you doing siml - cbind(siml=siml)? Seems superfluous to me. Delete the line. Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPlus script
On 13-06-06 6:22 PM, Rolf Turner wrote: On 07/06/13 03:19, Scott Raynaud wrote: I actually had tried placing arguments in the call but it didn't work. However, I did not think about writing it to a variable and printing. That seems to have done the trick. Funny, I don't remember having to do that before, but that's not surprising. If I remember correctly --- haven't used Splus for decades --- this is a difference between Splus and R. In R the output of a function is returned *invisibly* if that function is called from within another function. And source() is one such other function. Actually this depends on the caller. source() does return its results invisibly, but many other functions don't. source() is unusual in another way that came up recently (on R-devel, I think): it calls withVisible() on the code that it evaluates, which means that instead of a simple value it will return a list containing the value and an indicator about whether it should be displayed. It returns this list invisibly, leaving it up to whoever called source() to decide whether to display the value or not. Duncan Murdoch So if you have a script, say melvin with the single line: sin(42) and in R you execute source(melvin) you will see no output. If in another script, say clyde you have the single line print(sin(42)) and in R you execute source(clyde) you will see [1] -0.9165215 In Splus, IIRC, the print() call is unnecessary. I.e. you would get the same result by sourcing melvin and clyde. Current Splus users may correct me if I am wrong about this. cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add prediction interval forest plot (package meta)
At 19:34 06/06/2013, Bernd Weiss wrote: Dear all, I am struggling to add a prediction interval to a forest plot that was created with forest.meta(), package meta. I checked the source of forest.meta() and realized that it is heavily relying on grid. I am lacking any experience with grid graphics. So, I am having difficulties to find out where the random effects estimate is actually plotted in order to add prediction intervals. I appreciate this is not a direct answer to your question but the equivalent function in metafor (available from CRAN) does this and it might be easier to use that. Any help is much appreciated! Bernd R version 3.0.1 Patched (2013-05-28 r62825) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 [4] LC_NUMERIC=CLC_TIME=German_Germany.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] meta_2.3-0 loaded via a namespace (and not attached): [1] tools_3.0.1 Michael Dewey i...@aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] arima time series in R
Hi Could just anyone explain me the coefficients in the output of arima model timeseriesarima - arima(series, order=c(1,1,2)) timeseriesarima Series: series ARIMA(1,1,2) Coefficients: ar1 ma1 ma2 0.9744 -1.7695 0.7873 s.e. 0.0310 0.0481 0.0426 sigma^2 estimated as 337.4: log likelihood=-1096.03 AIC=2200.07 AICc=2200.23 BIC=2214.2 CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add prediction interval forest plot (package meta)
Am 07.06.2013 13:17, schrieb Michael Dewey: At 19:34 06/06/2013, Bernd Weiss wrote: Dear all, I am struggling to add a prediction interval to a forest plot that was created with forest.meta(), package meta. I checked the source of forest.meta() and realized that it is heavily relying on grid. I am lacking any experience with grid graphics. So, I am having difficulties to find out where the random effects estimate is actually plotted in order to add prediction intervals. I appreciate this is not a direct answer to your question but the equivalent function in metafor (available from CRAN) does this and it might be easier to use that. Thanks, Michael! Yes, I vaguely remember that the metafor package can calculate the PIs but most of our analyses rely on the meta package. On the other hand, this might be a good reason to finally switch to metafor. You probably refer to the function addpoly() which can be used for that. I will check it out. Again, thanks for your help, Bernd __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] arima time series in R
On 07/06/2013 12:21, Aakanksha Dahiya01 wrote: Hi Could just anyone explain me the coefficients in the output of arima model The person who wrote the help page already did, but that is hardly 'just anyone'. timeseriesarima - arima(series, order=c(1,1,2)) timeseriesarima Series: series ARIMA(1,1,2) Coefficients: ar1 ma1 ma2 0.9744 -1.7695 0.7873 s.e. 0.0310 0.0481 0.0426 sigma^2 estimated as 337.4: log likelihood=-1096.03 AIC=2200.07 AICc=2200.23 BIC=2214.2 That is not from arima in package stats, so you need to follow the posting guide to tell us whose wrapper it is and hence which help page to read. (Possibly package TSA.) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. That does mean you. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset with non logical rules
HI, Try: ?split() source(http://www.openintro.org/stat/data/cdc.R;) str(cdc) #'data.frame': 2 obs. of 9 variables: # $ genhlth : Factor w/ 5 levels excellent,very good,..: 3 3 3 3 2 2 2 2 3 3 ... # $ exerany : num 0 0 1 1 0 1 1 0 0 1 ... # $ hlthplan: num 1 1 1 1 1 1 1 1 1 1 ... # $ smoke100: num 0 1 1 0 0 0 0 0 1 0 ... # $ height : num 70 64 60 66 61 64 71 67 65 70 ... # $ weight : int 175 125 105 132 150 114 194 170 150 180 ... # $ wtdesire: int 175 115 105 124 130 114 185 160 130 170 ... # $ age : int 77 33 49 42 55 55 31 45 27 44 ... # $ gender : Factor w/ 2 levels m,f: 1 2 2 2 2 2 1 1 2 1 ... cdc$genhlth- as.character(cdc$genhlth) cdclst1- split(cdc,cdc$genhlth) lapply(cdclst1,head,2) #$excellent # genhlth exerany hlthplan smoke100 height weight wtdesire age gender #11 excellent 1 1 1 69 186 175 46 m #13 excellent 1 0 1 66 185 220 21 m # #$fair # genhlth exerany hlthplan smoke100 height weight wtdesire age gender #12 fair 1 1 1 69 168 148 62 m #15 fair 1 0 0 69 170 170 23 m # #$good # genhlth exerany hlthplan smoke100 height weight wtdesire age gender #1 good 0 1 0 70 175 175 77 m #2 good 0 1 1 64 125 115 33 f # #$poor # genhlth exerany hlthplan smoke100 height weight wtdesire age gender #53 poor 1 1 1 62 140 130 64 f #79 poor 1 1 0 63 142 120 52 f #$`very good` # genhlth exerany hlthplan smoke100 height weight wtdesire age gender #5 very good 0 1 0 61 150 130 55 f #6 very good 1 1 0 64 114 114 55 f sapply(cdclst1,nrow) #excellent fair good poor very good # 4657 2019 5675 677 6972 cdcGood-cdclst1[[good]] str(cdcGood) #'data.frame': 5675 obs. of 9 variables: # $ genhlth : chr good good good good ... # $ exerany : num 0 0 1 1 0 1 1 0 1 1 ... # $ hlthplan: num 1 1 1 1 1 1 1 0 1 1 ... # $ smoke100: num 0 1 1 0 1 0 1 1 1 1 ... # $ height : num 70 64 60 66 65 70 73 67 75 65 ... # $ weight : int 175 125 105 132 150 180 185 156 200 160 ... # $ wtdesire: int 175 115 105 124 130 170 175 150 190 140 ... # $ age : int 77 33 49 42 27 44 79 47 43 54 ... # $ gender : Factor w/ 2 levels m,f: 1 2 2 2 2 1 1 1 1 2 ... A.K. Hi I am trying to figure out how to subset a bunch of data. As an example I am using the cdc data from openintro.org. In the first column with the name genhlth there are various options that the persons could respond. For exmaple good very good and poor. Now what i would like to do is to seperate the data so that everyone who answered good are stored in one variable and everyone who answered poor are in another variable. Now I know i could just do subset(cdc, cdc$genhlth == poor) to get the poor, but would really like for a code that would seperate data into each group, regardless of what the text or the number of groups are. Can anyone give me a hint? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Evaluation of function names...
Folks: Feel free to provide me a link or reference instead of an answer. Preamble: f - function() function(x)rnorm(x) g - f() g(3) ## [1] -0.4448492 -0.2379978 -0.4537394 ## But rnorm - function()1 ## nasty nasty g(3) ## Error in rnorm(x) : unused argument(s) (x) ## of course f - function()function(x)stats:::rnorm(x) ## would fix this. Question 1: Suppose I defined f() as at top in a package namespace and exported it. Would a user see this same error defining g() and with rnorm redefined as above? (Assume stats is attached as usual). I presume so, but ... Question 2: If the answer to Q1 is yes, (how) can this be avoided without using fully qualified function names? Again, a quick reference to relevant docs would suffice. Thanks. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Get count by day for particular coulmn
HI, Please dput() the example data. Could anyone give a help ASAP ? You have been posting for long time and your many posts show this ASAP. I know that you got comments to that and also advised to dput() the data. Formatting your data took some time. Would it be better to follow the posting guidelines and then use ASAP? dat1- read.table(text= DATETIME,COL_A,COL_B,COL_C,COL_D 1/1/2007 0:01,0,3,0,0 1/1/2007 0:02,0,0,3,0 1/1/2007 0:03,0,3,0,0 1/2/2007 0,0,3,0,0 1/2/2007 0:01,0,3,4,0 1/2/2007 0:02,0,3,0,0 1/3/2007 0,0,0,0,0 1/3/2007 0:01,0,0,4,0 1/3/2007 0:02,0,3,0,0 ,sep=,,header=TRUE,stringsAsFactors=FALSE) dat1$DATETIME[!grepl(\\:,dat1$DATETIME)]-paste0(dat1$DATETIME[!grepl(\\:,dat1$DATETIME)],:00) res-aggregate(.~gsub(\\s+.*,,dat1$DATETIME),data=dat1[,-1],function(x) length(x[x==3])) colnames(res)[1]- colnames(dat1)[1] colnames(res)[-1]-paste0(COUNT(,colnames(res)[-1],=3)) res # DATETIME COUNT(COL_A=3) COUNT(COL_B=3) COUNT(COL_C=3) COUNT(COL_D=3) #1 1/1/2007 0 2 1 0 #2 1/2/2007 0 3 0 0 #3 1/3/2007 0 1 0 0 A.K. here i have a dataframe for eg:- DATETIMECOL_A COL_B COL_C COL_D 1/1/2007 0:01 0 3 0 0 1/1/2007 0:02 0 0 3 0 1/1/2007 0:03 0 3 0 0 ... . ... ... 1/2/20070 0 3 0 1/2/2007 0:01 0 3 4 0 1/2/2007 0:02 0 3 0 0 ... . ... ... 1/3/20070 0 0 0 1/3/2007 0:01 0 0 4 0 1/3/2007 0:02 0 3 0 0 ... . ... ... My requirement what is, i have to get the count for each day where COL_B = 3 For eg:- here i need to get like DATETIME COUNT(COL_B=3) 1/1/2007 2 1/2/2007 3 1/3/2007 1 = = and this way i tried to get, MyDF[MyDF[DATETIME]==1/2/2007] --- here this only select the row where DATETIME - column coming as 1/2/2007 - date and not selecting other rows where same date is coming (eg:- 1/1/2007 0:01). And here i need to get the complete records for that particular day, when i give date without giving timestamp. - Could anyone give a help ASAP ? - Thanks Antony. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matched samples, dataframe, panel data
Again my problem, better explained. #I have a data panel of thousands of firms, by year and industry and #one dummy variable that identifies one kind of firms (1 if the firm have an auditor; 0 if not) #and another variable the represents the firm dimension (total assets in thousand of euros) #I need to create two separated samples with the same number os firms where #one firm in the first have a corresponding firm in the second with the same #year, industry and dimension (the dimension doesn't need to be exatly the #same, it could vary in an interval of +/- 10%, for example) #My reproducible example firm1-sort(rep(1:10,5),decreasing=F) year1-rep(2000:2004,10) industry1-rep(20,50) dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1) dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789, 2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824, 23,2897,3456,7690,6022,3678,9431,2890) data1-data.frame(firm1,year1,industry1,dummy1,dimension1) data1 colnames(data1)-c(firm,year,industry,dummy,dimension) firm2-sort(rep(11:15,3),decreasing=F) year2-rep(2001:2003,5) industry2-rep(30,15) dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1) dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345) data2-data.frame(firm2,year2,industry2,dummy2,dimension2) data2 colnames(data2)-c(firm,year,industry,dummy,dimension) firm3-sort(rep(16:20,4),decreasing=F) year3-rep(2001:2004,5) industry3-rep(40,20) dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0) dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824, 23,2897) data3-data.frame(firm3,year3,industry3,dummy3,dimension3) data3 colnames(data3)-c(firm,year,industry,dummy,dimension) final1-rbind(data1,data2) final2-rbind(final1,data3) final2 final3-final2[order(final2$year,final2$industry,final2$dimension),] final3 #So my data is final3 is like this: firm year industry dummy dimension 266 2000 20 0 781 1 1 2000 20 0 2120 215 2000 20 1 2189 368 2000 20 1 2765 164 2000 20 0 3178 317 2000 20 1 3245 113 2000 20 1 4532 6 2 2000 20 0 4890 419 2000 20 0 5438 46 10 2000 20 0 7690 2 1 2001 20 0 345 378 2001 20 1 389 327 2001 20 0 1234 174 2001 20 0 2678 7 2 2001 20 1 2789 225 2001 20 1 4289 47 10 2001 20 0 6022 123 2001 20 1 6593 276 2001 20 0 35489 429 2001 20 1 37824 60 14 2001 30 1 2341 54 12 2001 30 0 2345 57 13 2001 30 1 3245 51 11 2001 30 0 12456 63 15 2001 30 1 12900 78 19 2001 40 1 389 74 18 2001 40 1 1288 82 20 2001 40 0 6438 70 17 2001 40 1 6754 66 16 2001 40 0 23456 439 2002 20 023 337 2002 20 125 3 1 2002 20 1 2341 286 2002 20 0 2345 8 2 2002 20 1 3412 48 10 2002 20 1 3678 184 2002 20 0 235 2002 20 0 8543 133 2002 20 0 12900 388 2002 20 1 23456 64 15 2002 30 0 123 52 11 2002 30 0 781 58 13 2002 30 1 2120 61 14 2002 30 1 5678 55 12 2002 30 0 5754 67 16 2002 40 0 1181 75 18 2002 40 1 1200 71 17 2002 40 0 8976 79 19 2002 40 0 23456 83 20 2002 40 1 24824 143 2003 20 0 123 245 2003 20 0 637 194 2003 20 1 647 347 2003 20 0 1200 398 2003 20 1 2367 449 2003 20 0 2897 4 1 2003 20 1 5678 296 2003 20 0 5754 49 10 2003 20 1 9431 9 2 2003 20 0 9500 59 13 2003 30 1 345 65 15 2003 30 1 2345 56 12 2003 30 0 8976 62 14 2003 30 1 10900 53 11 2003 30 0 32489 84 20 2003 40 023 76 18 2003 40 1 2345 80 19 2003 40 0 2367 72 17 2003 40 1 3245 68 16 2003 40 1 32489 153 2004 20 0 2345 357 2004 20 1 2345 50 10 2004 20 1 2890 459 2004 20 0 3456 408 2004 20 0 3892 102 2004 20 1 8765 306
Re: [R] Add prediction interval forest plot (package meta)
At 12:39 07/06/2013, Bernd Weiß wrote: Am 07.06.2013 13:17, schrieb Michael Dewey: At 19:34 06/06/2013, Bernd Weiss wrote: Dear all, I am struggling to add a prediction interval to a forest plot that was created with forest.meta(), package meta. I checked the source of forest.meta() and realized that it is heavily relying on grid. I am lacking any experience with grid graphics. So, I am having difficulties to find out where the random effects estimate is actually plotted in order to add prediction intervals. I appreciate this is not a direct answer to your question but the equivalent function in metafor (available from CRAN) does this and it might be easier to use that. Thanks, Michael! Yes, I vaguely remember that the metafor package can calculate the PIs but most of our analyses rely on the meta package. On the other hand, this might be a good reason to finally switch to metafor. You probably refer to the function addpoly() which can be used for that. I will check it out. I was thinking of the addcred parameter to forest.rma.uni in fact, although you could roll your own. Again, thanks for your help, Bernd Michael Dewey i...@aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with ODBC connection
Hello again, I am having problem with ODBC connection using the RODBC package. I am basically trying to read the attached Excel-2003 file using RODBC package. Here is my code: head(sqlFetch(odbcConnectExcel(d:/1.xls), ), 30); odbcCloseAll() Criteria s d fd ffd1 f1fd2f2 fd3 f3 F12 F13 F14 F15 F16 F17 F18 F19 F20 1 a NA NA NA NA 0. 0.27755576 -0.00040332321NA NA NA NA NA NA NA NA NA NA NA NA 2 s NA 0 NA NA 0. 0. 0.000NA NA NA NA NA NA NA NA NA NA NA NA 3 d NA 0 NA NA 0.01734723 0.06938894 0.2775558 5.00 NA NA NA NA NA NA NA NA NA NA NA 4 f NA NA NA NA NA NA NA -4.25 NA NA NA NA NA NA NA NA NA NA NA 5 f NA 0 NA NA 0. 0. 0.000 -1.53 NA NA NA NA NA NA NA NA NA NA NA 6 f NA NA NA NA NA NA 0.000 0.00 NA NA NA NA NA NA NA NA NA NA NA 7 f NA NA NA NA NA NA 0.000NA NA NA NA NA NA NA NA NA NA NA NA 8 f NA 0 NA NA NA NA NANA NA NA NA NA NA NA NA NA NA NA NA 9 f NA 0 NA NA NA NA NANA NA NA NA NA NA NA NA NA NA NA NA 10f NA NA NA NA NA NA NANA NA NA NA NA NA NA NA NA NA NA NA 11f NA NA NA NA NA NA NANA NA NA NA NA NA NA NA NA NA NA NA 12f NA NA NA NA NA NA NANA NA NA NA NA NA NA NA NA NA NA NA 13f NA NA NA NA NA NA NANA NA NA NA NA NA NA NA NA NA NA NA Here you see the data in second column could not read at all. Can somebody point me if I did something wrong? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Clogit R and Stata
Sorry to once again write a message but I'm once again stumped and am having no luck finding a solution anywhere else. This question requires some finesse in both R and STATA so hopefully I will be able to get an answer here. I am much more adept in R and am trying to replicate the results of a STATA file in R. Hopefully this is a proper forum for such questions. This is the code for the clogit in STATA clogit sftpcons sftptv2a3 sftptv2a4 sftptv2a5 sftptv2a2 sftptv2a6 logim maccat disp4cat if sample==1 glb_ind==Y, group(stratida) and I tried to replicate it using clogit1-clogit(sftpcons~sftptv2a3+sftptv2a4+sftptv2a5+sftptv2a2+sftptv2a6+logim+maccat+disp4cat+strata(stratida), dframe, sample==1 | glb_ind==Y) but got different results What did I do wrong here? I interpreted the STATA clogit as run this logit as long as the sample is 1 and glb_ind=Y What should I be doing instead? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bionconductor help
Hi I am trying to do some data analysis using R and Bioconductor. I have a function to read my data called ReadAffyData and a function to plot the data called preqc. I want to know if there is any way I can extract information (read: Data) from the ReadAffyData function into preqc in order to produce plots? I have not completely understood the inheritance in R and I am getting errors similar to the one below. I am new to R and could do with any help in this aspect! ReadAffyData - function( filename ) { require(affy) require(annotate) Cov - read.table( filename, sep=\t, header=1, quote=, comment= ) if( ! all( c( Filename, Label, Repl, Trt ) %in% colnames( Cov ) ) ) { stop( Missing mandatory column ) } Cov - Cov[order(Cov$Trt),] i - table(Cov$Trt) Cov$Repl - unlist( lapply( i, function( j ) 1:j ) ) Cov$Label - paste( as.character( Cov$Trt ), Cov$Repl, sep=: ) rownames(Cov) - as.character(Cov$Label) tmp1 - colnames( Cov ) tmp2 - rep( , length( Cov ) ) for( i in 1:length( tmp2 ) ) { tmp2[i] - paste( sort( unique( as.character( Cov[,i] ) ) ), collapse=/ ) } labelDescription - data.frame( labelDescription=tmp2 ) rownames( labelDescription ) - tmp1 tmp - new( AnnotatedDataFrame, data=Cov, varMetadata=labelDescription) Data - ReadAffy( sampleNames=as.character( Cov$Label ),phenoData=tmp, verbose=TRUE ) } preQC - function(name){ ReadAffyData(name) plotDensity( log2( pm( Data ) ), xlab=Log2( Intensity ), ylab=Density, main=Raw(PM)) } preQC(cov.txt) 1 reading C:/CEL/GSM311471.CEL ...instantiating an AffyBatch (intensity a 506944x24 matrix)...done. Reading in : C:/CEL/GSM311471.CEL Reading in : C:/CEL/GSM311472.CEL Reading in : C:/CEL/GSM311473.CEL Reading in : C:/CEL/GSM311474.CEL Reading in : C:/CEL/GSM311475.CEL . . . Error in pm(Data) : error in evaluating the argument 'object' in selecting a method for function 'pm': Error: object 'Data' not found sessionInf() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 LC_MONETARY=English_India.1252 [4] LC_NUMERIC=C LC_TIME=English_India.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] annotate_1.38.0 AnnotationDbi_1.22.6 affy_1.38.1 Biobase_2.20.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affyio_1.28.0 BiocInstaller_1.10.1 DBI_0.2-7 IRanges_1.18.1preprocessCore_1.22.0 [6] RSQLite_0.11.4stats4_3.0.1 tools_3.0.1 XML_3.96-1.1 xtable_1.7-1 [11] zlibbioc_1.6.0 Thanking you, Ipsitha Graduate Student London [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cannot load pbdMPI package after compilation
Hello, I try to install pbdMPI. Compilation successful, but load fails with segfault. Is anyone can help me? R version 3.0.0 pbdMPI version 0.1-6 Intel compiler version 13.1.1 OpenMPI version 1.6.4-1 CPU Intel x86_64 # R CMD INSTALL pbdMPI_0.1-6.tar.gz .. checking for gcc... icc -std=gnu99 checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether icc -std=gnu99 accepts -g... yes checking for icc -std=gnu99 option to accept ISO C89... none needed checking for mpirun... mpirun checking for mpiexec... mpiexec checking for orterun... orterun checking for sed... /bin/sed checking for mpicc... mpicc checking for ompi_info... ompi_info checking for mpich2version... F found sed, mpicc, and ompi_info ... TMP_INC_DIRS = /opt/openmpi/1.6.4-1/intel-13.1.1/include checking /opt/openmpi/1.6.4-1/intel-13.1.1/include ... found /opt/openmpi/1.6.4-1/intel-13.1.1/include/mpi.h ... TMP_LIB_DIRS = /opt/openmpi/1.6.4-1/intel-13.1.1/lib64 checking /opt/openmpi/1.6.4-1/intel-13.1.1/lib64 ... found /opt/openmpi/1.6.4-1/intel-13.1.1/lib64/libmpi.so ... found mpi.h and libmpi.so ... TMP_INC = /opt/openmpi/1.6.4-1/intel-13.1.1/include TMP_LIB = /opt/openmpi/1.6.4-1/intel-13.1.1/lib64 checking for openpty in -lutil... yes checking for main in -lpthread... yes *** Results of pbdMPI package configure * TMP_INC = /opt/openmpi/1.6.4-1/intel-13.1.1/include TMP_LIB = /opt/openmpi/1.6.4-1/intel-13.1.1/lib64 MPI_ROOT = MPITYPE = OPENMPI MPI_INCLUDE_PATH = /opt/openmpi/1.6.4-1/intel-13.1.1/include MPI_LIBPATH = /opt/openmpi/1.6.4-1/intel-13.1.1/lib64 MPI_LIBS = -lutil -lpthread MPI_DEFS = -DMPI2 MPI_INCL2 = PKG_CPPFLAGS = -I/opt/openmpi/1.6.4-1/intel-13.1.1/include -DMPI2 -DOPENMPI PKG_LIBS = -L/opt/openmpi/1.6.4-1/intel-13.1.1/lib64 -lmpi -lutil -lpthread * .. icc -std=gnu99 -I/usr/local/R/3.0.0/intel13/lib64/R/include -DNDEBUG -I/opt/openmpi/1.6.4-1/intel-13.1.1/include -DMPI2 -DOPENMPI -O3 -fp-model precise -pc 64 -axAVX-fpic -O3 -fp-model precise -pc 64 -axAVX -c comm_errors.c -o comm_errors.o icc -std=gnu99 -I/usr/local/R/3.0.0/intel13/lib64/R/include -DNDEBUG -I/opt/openmpi/1.6.4-1/intel-13.1.1/include -DMPI2 -DOPENMPI -O3 -fp-model precise -pc 64 -axAVX-fpic -O3 -fp-model precise -pc 64 -axAVX -c comm_sort_double.c -o comm_sort_double.o . .. ** testing if installed package can be loaded sh: line 1: 2905 Segmentation fault '/usr/local/R/3.0.0/intel13/lib64/R/bin/R' --no-save --slave 21 /tmp/RtmpGkncGK/file1e541c57190 ERROR: loading failed *** caught segfault *** address (nil), cause 'unknown' Traceback: 1: .Call(spmd_initialize, PACKAGE = pbdMPI) 2: fun(libname, pkgname) 3: doTryCatch(return(expr), name, parentenv, handler) 4: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 5: tryCatchList(expr, classes, parentenv, handlers) 6: tryCatch(fun(libname, pkgname), error = identity) 7: runHook(.onLoad, env, package.lib, package) 8: loadNamespace(package, c(which.lib.loc, lib.loc)) 9: doTryCatch(return(expr), name, parentenv, handler) 10: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 11: tryCatchList(expr, classes, parentenv, handlers) 12: tryCatch(expr, error = function(e) {call - conditionCall(e) if (!is.null(call)) {if (identical(call[[1L]], quote(doTryCatch))) call - sys.call(-4L)dcall - deparse(call)[1L]prefix - paste(Error in, dcall, : ) LONG - 75Lmsg - conditionMessage(e)sm - strsplit(msg, \n)[[1L]]w - 14L + nchar(dcall, type = w) + nchar(sm[1L], type = w)if (is.na(w)) w - 14L + nchar(dcall, type = b) + nchar(sm[1L], type = b)if (w LONG) prefix - paste0(prefix, \n )}else prefix - Error : msg - paste0(prefix, conditionMessage(e), \n) .Internal(seterrmessage(msg[1L]))if (!silent identical(getOption(show.error.messages), TRUE)) { cat(msg, file = stderr()).Internal(printDeferredWarnings()) }invisible(structure(msg, class = try-error, condition = e))}) 13: try({ns - loadNamespace(package, c(which.lib.loc, lib.loc)) env - attachNamespace(ns, pos = pos, deps)}) 14: library(pkg_name, lib.loc = lib, character.only = TRUE, logical.return = TRUE) 15: withCallingHandlers(expr, packageStartupMessage = function(c) invokeRestart(muffleMessage)) 16: suppressPackageStartupMessages(library(pkg_name, lib.loc = lib, character.only = TRUE, logical.return = TRUE)) 17: doTryCatch(return(expr), name, parentenv, handler) 18: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 19: tryCatchList(expr, classes, parentenv, handlers) 20: tryCatch(expr, error =
Re: [R] generating a bar chart with two axis for co-linear variable
Arun, Perfect, this is what I was looking for. Thanks, Sudha Krishnan -Original Message- From: arun [mailto:smartpink...@yahoo.com] Sent: Friday, June 07, 2013 2:36 AM To: Sudha Krishnan Cc: R help Subject: Re: [R] generating a bar chart with two axis for co-linear variable HI, Not sure if this is what you wanted. pdf(BarplotsNew.pdf) library(plotrix) lst2-lapply(seq_len(ncol(dat1)),function(i){ Ctdat- table(dat1[,i]) Ctdat1-(Ctdat/sum(Ctdat))*100 dat2-data.frame(Ctdat,Ctdat1,stringsAsFactors=FALSE)[,-3] colnames(dat2)[3]-Rel.Freq dat2[,1]- as.numeric(as.character(dat2[,1])) with(dat2,twoord.plot(Var1,Rel.Freq,Freq, lylim=c(0,100),rylim=c(0,1000), ylab=Relative Frequency, rylab=Frequency,main=paste(Bar plot:,colnames(dat1)[i],sep= ), type=c(bar,l),lcol=2,rcol=4,xtickpos=Var1,xticklab=Var1)) }) dev.off() A.K. - Original Message - From: arun smartpink...@yahoo.com To: Sudha Krishnan sudha.krish...@marlabs.com Cc: R help r-help@r-project.org Sent: Thursday, June 6, 2013 2:44 PM Subject: Re: [R] generating a bar chart with two axis for co-linear variable HI, May be this helps: dat1- read.table(sampledata.txt,header=TRUE,sep=,,stringsAsFactors=FALSE) pdf(Barplots.pdf) lst1-lapply(seq_len(ncol(dat1)),function(i) {Ctdat- table(dat1[,i]);Ctdat1-(Ctdat/sum(Ctdat))*100;barplot(Ctdat1,ylim=c(0,100),xlab=colnames(dat1)[i],ylab=Relative Frequency,main=paste(Barplot:,colnames(dat1)[i],sep= ))}) dev.off() A.K. - Original Message - From: Sudha Krishnan sudha.krish...@marlabs.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Thursday, June 6, 2013 2:37 AM Subject: [R] generating a bar chart with two axis for co-linear variable Hello Dimitris, I was goggling for some help on Sensitivity vs 1-specificity and saw your link. I hope you can be of help to me in one of the issue that I am facing in generating combo chart(bar chart and plot). I am a novice and have some difficulty in getting this logic correct. I am give a dataset (I am attaching a sample dataset). I am using a barplot() and passing values for percentage frequency and the corresponding variables. I am struck here, what my function does is only calculate the frequency for the listed variables and not the frequency percentage. Is there a method or a script with which I can pass the frequency percent and the related values as category columns for x axis? I will attach the graphs that I have generated so that you can suggest the better way. Sampledata - Sampledata.txt What my function does to calculate the frequency with category names in X axis - 1.png My requirement is to generate percentage frequency of the variable in y1 and not the frequency itself. 2.png (where x categories are missing) Thanks, Sudha Krishnan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset with non logical rules
aah perfect, exactly what i was looking for! Thank you On Fri, Jun 7, 2013 at 3:24 PM, arun kirshna [via R] ml-node+s789695n4668923...@n4.nabble.com wrote: HI, Try: ?split() source(http://www.openintro.org/stat/data/cdc.R;) str(cdc) #'data.frame':2 obs. of 9 variables: # $ genhlth : Factor w/ 5 levels excellent,very good,..: 3 3 3 3 2 2 2 2 3 3 ... # $ exerany : num 0 0 1 1 0 1 1 0 0 1 ... # $ hlthplan: num 1 1 1 1 1 1 1 1 1 1 ... # $ smoke100: num 0 1 1 0 0 0 0 0 1 0 ... # $ height : num 70 64 60 66 61 64 71 67 65 70 ... # $ weight : int 175 125 105 132 150 114 194 170 150 180 ... # $ wtdesire: int 175 115 105 124 130 114 185 160 130 170 ... # $ age : int 77 33 49 42 55 55 31 45 27 44 ... # $ gender : Factor w/ 2 levels m,f: 1 2 2 2 2 2 1 1 2 1 ... cdc$genhlth- as.character(cdc$genhlth) cdclst1- split(cdc,cdc$genhlth) lapply(cdclst1,head,2) #$excellent # genhlth exerany hlthplan smoke100 height weight wtdesire age gender #11 excellent 111 69186 175 46 m #13 excellent 101 66185 220 21 m # #$fair # genhlth exerany hlthplan smoke100 height weight wtdesire age gender #12fair 111 69168 148 62 m #15fair 100 69170 170 23 m # #$good # genhlth exerany hlthplan smoke100 height weight wtdesire age gender #1good 010 70175 175 77 m #2good 011 64125 115 33 f # #$poor # genhlth exerany hlthplan smoke100 height weight wtdesire age gender #53poor 111 62140 130 64 f #79poor 110 63142 120 52 f #$`very good` #genhlth exerany hlthplan smoke100 height weight wtdesire age gender #5 very good 010 61150 130 55 f #6 very good 110 64114 114 55 f sapply(cdclst1,nrow) #excellent fair good poor very good # 4657 2019 5675 677 6972 cdcGood-cdclst1[[good]] str(cdcGood) #'data.frame':5675 obs. of 9 variables: # $ genhlth : chr good good good good ... # $ exerany : num 0 0 1 1 0 1 1 0 1 1 ... # $ hlthplan: num 1 1 1 1 1 1 1 0 1 1 ... # $ smoke100: num 0 1 1 0 1 0 1 1 1 1 ... # $ height : num 70 64 60 66 65 70 73 67 75 65 ... # $ weight : int 175 125 105 132 150 180 185 156 200 160 ... # $ wtdesire: int 175 115 105 124 130 170 175 150 190 140 ... # $ age : int 77 33 49 42 27 44 79 47 43 54 ... # $ gender : Factor w/ 2 levels m,f: 1 2 2 2 2 1 1 1 1 2 ... A.K. Hi I am trying to figure out how to subset a bunch of data. As an example I am using the cdc data from openintro.org. In the first column with the name genhlth there are various options that the persons could respond. For exmaple good very good and poor. Now what i would like to do is to seperate the data so that everyone who answered good are stored in one variable and everyone who answered poor are in another variable. Now I know i could just do subset(cdc, cdc$genhlth == poor) to get the poor, but would really like for a code that would seperate data into each group, regardless of what the text or the number of groups are. Can anyone give me a hint? __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/subset-with-non-logical-rules-tp4668906p4668923.html To unsubscribe from subset with non logical rules, click here. NAML -- View this message in context: http://r.789695.n4.nabble.com/subset-with-non-logical-rules-tp4668906p4668924.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Evaluation of function names...
On 07/06/2013 14:49, Bert Gunter wrote: Folks: Feel free to provide me a link or reference instead of an answer. Preamble: f - function() function(x)rnorm(x) g - f() g(3) ## [1] -0.4448492 -0.2379978 -0.4537394 ## But rnorm - function()1 ## nasty nasty g(3) ## Error in rnorm(x) : unused argument(s) (x) ## of course f - function()function(x)stats:::rnorm(x) ## would fix this. Question 1: Suppose I defined f() as at top in a package namespace and exported it. Would a user see this same error defining g() and with rnorm redefined as above? (Assume stats is attached as usual). I presume so, but ... Not if done right. The package should importFrom(stats, rnorm). Question 2: If the answer to Q1 is yes, (how) can this be avoided without using fully qualified function names? Again, a quick reference to relevant docs would suffice. 'Writing R Extensions' says 'The namespace controls the search strategy for variables used by functions in the package. If not found locally, R searches the package namespace first, then the imports, then the base namespace and then the normal search path.' -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Get count by day for particular coulmn
here i have a dataframe for eg:- DATETIMECOL_A COL_B COL_C COL_D 1/1/2007 0:01 0 3 0 0 1/1/2007 0:02 0 0 3 0 1/1/2007 0:03 0 3 0 0 ... . ... ... 1/2/20070 0 3 0 1/2/2007 0:01 0 3 4 0 1/2/2007 0:02 0 3 0 0 ... . ... ... 1/3/20070 0 0 0 1/3/2007 0:01 0 0 4 0 1/3/2007 0:02 0 3 0 0 ... . ... ... My requirement what is, i have to get the count for each day where COL_B = 3 For eg:- here i need to get like DATETIME COUNT(COL_B=3) 1/1/2007 2 1/2/2007 3 1/3/2007 1 = = and this way i tried to get, MyDF[MyDF[DATETIME]==1/2/2007] --- here this only select the row where DATETIME - column coming as 1/2/2007 - date and not selecting other rows where same date is coming (eg:- 1/1/2007 0:01). And here i need to get the complete records for that particular day, when i give date without giving timestamp. - Could anyone give a help ASAP ? - Thanks Antony. -- View this message in context: http://r.789695.n4.nabble.com/Get-count-by-day-for-particular-coulmn-tp4668915.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Evaluation of function names...
Thank you. That answers my question. -- Bert On Fri, Jun 7, 2013 at 7:09 AM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote: On 07/06/2013 14:49, Bert Gunter wrote: Folks: Feel free to provide me a link or reference instead of an answer. Preamble: f - function() function(x)rnorm(x) g - f() g(3) ## [1] -0.4448492 -0.2379978 -0.4537394 ## But rnorm - function()1 ## nasty nasty g(3) ## Error in rnorm(x) : unused argument(s) (x) ## of course f - function()function(x)stats:::rnorm(x) ## would fix this. Question 1: Suppose I defined f() as at top in a package namespace and exported it. Would a user see this same error defining g() and with rnorm redefined as above? (Assume stats is attached as usual). I presume so, but ... Not if done right. The package should importFrom(stats, rnorm). Question 2: If the answer to Q1 is yes, (how) can this be avoided without using fully qualified function names? Again, a quick reference to relevant docs would suffice. 'Writing R Extensions' says 'The namespace controls the search strategy for variables used by functions in the package. If not found locally, R searches the package namespace first, then the imports, then the base namespace and then the normal search path.' -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bionconductor help
Hi, You would get better response if you post at Bioconductor mailing list. http://www.bioconductor.org/help/mailing-list/ A.K. - Original Message - From: Payal Urs payal@gmail.com To: r-help@r-project.org Cc: Sent: Friday, June 7, 2013 6:12 AM Subject: [R] Bionconductor help Hi I am trying to do some data analysis using R and Bioconductor. I have a function to read my data called ReadAffyData and a function to plot the data called preqc. I want to know if there is any way I can extract information (read: Data) from the ReadAffyData function into preqc in order to produce plots? I have not completely understood the inheritance in R and I am getting errors similar to the one below. I am new to R and could do with any help in this aspect! ReadAffyData - function( filename ) { require(affy) require(annotate) Cov - read.table( filename, sep=\t, header=1, quote=, comment= ) if( ! all( c( Filename, Label, Repl, Trt ) %in% colnames( Cov ) ) ) { stop( Missing mandatory column ) } Cov - Cov[order(Cov$Trt),] i - table(Cov$Trt) Cov$Repl - unlist( lapply( i, function( j ) 1:j ) ) Cov$Label - paste( as.character( Cov$Trt ), Cov$Repl, sep=: ) rownames(Cov) - as.character(Cov$Label) tmp1 - colnames( Cov ) tmp2 - rep( , length( Cov ) ) for( i in 1:length( tmp2 ) ) { tmp2[i] - paste( sort( unique( as.character( Cov[,i] ) ) ), collapse=/ ) } labelDescription - data.frame( labelDescription=tmp2 ) rownames( labelDescription ) - tmp1 tmp - new( AnnotatedDataFrame, data=Cov, varMetadata=labelDescription) Data - ReadAffy( sampleNames=as.character( Cov$Label ),phenoData=tmp, verbose=TRUE ) } preQC - function(name){ ReadAffyData(name) plotDensity( log2( pm( Data ) ), xlab=Log2( Intensity ), ylab=Density, main=Raw(PM)) } preQC(cov.txt) 1 reading C:/CEL/GSM311471.CEL ...instantiating an AffyBatch (intensity a 506944x24 matrix)...done. Reading in : C:/CEL/GSM311471.CEL Reading in : C:/CEL/GSM311472.CEL Reading in : C:/CEL/GSM311473.CEL Reading in : C:/CEL/GSM311474.CEL Reading in : C:/CEL/GSM311475.CEL . . . Error in pm(Data) : error in evaluating the argument 'object' in selecting a method for function 'pm': Error: object 'Data' not found sessionInf() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 LC_MONETARY=English_India.1252 [4] LC_NUMERIC=C LC_TIME=English_India.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] annotate_1.38.0 AnnotationDbi_1.22.6 affy_1.38.1 Biobase_2.20.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affyio_1.28.0 BiocInstaller_1.10.1 DBI_0.2-7 IRanges_1.18.1 preprocessCore_1.22.0 [6] RSQLite_0.11.4 stats4_3.0.1 tools_3.0.1 XML_3.96-1.1 xtable_1.7-1 [11] zlibbioc_1.6.0 Thanking you, Ipsitha Graduate Student London [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Get count by day for particular coulmn
Hi, On Fri, Jun 7, 2013 at 7:38 AM, R_Antony antony.akk...@ge.com wrote: here i have a dataframe for eg:- DATETIMECOL_A COL_B COL_C COL_D 1/1/2007 0:01 0 3 0 0 1/1/2007 0:02 0 0 3 0 1/1/2007 0:03 0 3 0 0 ... . ... ... 1/2/20070 0 3 0 1/2/2007 0:01 0 3 4 0 1/2/2007 0:02 0 3 0 0 ... . ... ... 1/3/20070 0 0 0 1/3/2007 0:01 0 0 4 0 1/3/2007 0:02 0 3 0 0 ... . ... ... My requirement what is, i have to get the count for each day where COL_B = 3 For eg:- here i need to get like DATETIME COUNT(COL_B=3) 1/1/2007 2 1/2/2007 3 1/3/2007 1 = = Since you didn't provide reproducible data (dput() is great for that), here's an example with fake data: MyDF - data.frame(DATETIME = c(1/1/2007, 1/1/2007, 1/1/2007, 1/2/2007, 1/2/2007, 1/2/2007, 1/3/2007, 1/3/2007, 1/3/2007), COL_A = c(0, 0, 0, 1, 0, 1, 2, 3, 1), COL_B = c(0, 3, 0, 3, 3, 1, 0, 1, 2), COL_C = c(1, 2, 3, 1, 2, 3, 1, 2, 3), stringsAsFactors=FALSE) aggregate(COL_B ~ DATETIME, data=MyDF, FUN=function(x)sum(x == 3)) and this way i tried to get, MyDF[MyDF[DATETIME]==1/2/2007] --- here this only select the row where DATETIME - column coming as 1/2/2007 - date and not selecting other rows where same date is coming (eg:- 1/1/2007 0:01). And here i need to get the complete records for that particular day, when i give date without giving timestamp. You omitted a comma: MyDF[MyDF[DATETIME]==1/2/2007, ] You might prefer: subset(MyDF, DATETIME == 1/2/2007) Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] increase outlier size on boxplot
Hi, Does anybody know how to increase boxplot outlier symbol size? And also the line from min/max to outlier Thanks -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] increase outlier size on boxplot
Hi Shane, On Fri, Jun 7, 2013 at 10:36 AM, Shane Carey careys...@gmail.com wrote: Hi, Does anybody know how to increase boxplot outlier symbol size? And also the line from min/max to outlier If you read the help for boxplot, it will send you to the help for bxp for more information on custom plotting. ?bxp will offer you a vast array of possibilities, including: whisklty, whisklwd, whiskcol: whisker line type (default: ‘dashed’), width, and color. outlty, outlwd, outpch, outcex, outcol, outbg: outlier line type, line width, point character, point size expansion, color, and background color. accompanied by reproducible examples. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] increase outlier size on boxplot
Great, thanks Sarah On Fri, Jun 7, 2013 at 3:46 PM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi Shane, On Fri, Jun 7, 2013 at 10:36 AM, Shane Carey careys...@gmail.com wrote: Hi, Does anybody know how to increase boxplot outlier symbol size? And also the line from min/max to outlier If you read the help for boxplot, it will send you to the help for bxp for more information on custom plotting. ?bxp will offer you a vast array of possibilities, including: whisklty, whisklwd, whiskcol: whisker line type (default: dashed), width, and color. outlty, outlwd, outpch, outcex, outcol, outbg: outlier line type, line width, point character, point size expansion, color, and background color. accompanied by reproducible examples. Sarah -- Sarah Goslee http://www.functionaldiversity.org -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get a subscript of a vector?
Hi, ?which() which(A==2) #[1] 3 7 10 A.K. Suppose there is a vector A - c(1,3,2,6,7,8,2,1,3,2). Now, I want to get the subscript of elements of A which equal 2. How can I do it ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ROC Curves for Neural Networks
Trying to plot an ROC curve to determine effectiveness of a neural network model in R. using the neuralnet package I created nn1a: nn1a-neuralnet(warstns ~ aigenlz+ p2l+anoclinv2+flinstab+mill+minconl+fllgdppclz+lpoplz+floil+flmtnestz+c_peaceyears, data=na.omit(dfram1), hidden=4) then using the lroc function from the epicalc package I tried lroc(nn1a) but came up with the error message Error in `colnames-`(`*tmp*`, value = c(Non-diseased, Diseased)) : length of 'dimnames' [2] not equal to array extent Does anyone know any ways to get ROC curves to work with neural network models? Any other packages, model manipulation, etc? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bionconductor help
On 06/07/2013 07:16 AM, arun wrote: Hi, You would get better response if you post at Bioconductor mailing list. http://www.bioconductor.org/help/mailing-list/ Agreed, though in this case I think ReadAffyData - function( filename ) { ... Data - ReadAffy( sampleNames=as.character( Cov$Label), phenoData=tmp, verbose=TRUE ) } preQC - function(name){ ReadAffyData(name) what needs to happen here is that the return value needs to be assigned to a variable, Data - ReadAffyData(name) Martin plotDensity( log2( pm( Data ) ), xlab=Log2( Intensity ), ylab=Density, main=Raw(PM)) } A.K. - Original Message - From: Payal Urs payal@gmail.com To: r-help@r-project.org Cc: Sent: Friday, June 7, 2013 6:12 AM Subject: [R] Bionconductor help Hi I am trying to do some data analysis using R and Bioconductor. I have a function to read my data called ReadAffyData and a function to plot the data called preqc. I want to know if there is any way I can extract information (read: Data) from the ReadAffyData function into preqc in order to produce plots? I have not completely understood the inheritance in R and I am getting errors similar to the one below. I am new to R and could do with any help in this aspect! ReadAffyData - function( filename ) { ... Data - ReadAffy( sampleNames=as.character( Cov$Label ),phenoData=tmp, verbose=TRUE ) } preQC - function(name){ ReadAffyData(name) plotDensity( log2( pm( Data ) ), xlab=Log2( Intensity ), ylab=Density, main=Raw(PM)) } preQC(cov.txt) 1 reading C:/CEL/GSM311471.CEL ...instantiating an AffyBatch (intensity a 506944x24 matrix)...done. Reading in : C:/CEL/GSM311471.CEL Reading in : C:/CEL/GSM311472.CEL Reading in : C:/CEL/GSM311473.CEL Reading in : C:/CEL/GSM311474.CEL Reading in : C:/CEL/GSM311475.CEL . . . Error in pm(Data) : error in evaluating the argument 'object' in selecting a method for function 'pm': Error: object 'Data' not found sessionInf() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 LC_MONETARY=English_India.1252 [4] LC_NUMERIC=C LC_TIME=English_India.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] annotate_1.38.0 AnnotationDbi_1.22.6 affy_1.38.1 Biobase_2.20.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] affyio_1.28.0 BiocInstaller_1.10.1 DBI_0.2-7 IRanges_1.18.1preprocessCore_1.22.0 [6] RSQLite_0.11.4stats4_3.0.1 tools_3.0.1 XML_3.96-1.1 xtable_1.7-1 [11] zlibbioc_1.6.0 Thanking you, Ipsitha Graduate Student London [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glm.nb error
Dear R Community, I have encountered a problem while using the R function glm.nb. The code that produce the error was following two lines: group=c(1,1,1,1,0,0,0,0) fit=glm.nb(y~group) While the y contains 8 sets of number like: gene2750 1 0 0 1 5 1 0 Error message: Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: There were 50 or more warnings (use warnings() to see the first 50) Execution halted Information of my system: sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Does anyone happen to have some hit on how to solve this? Appreciate for any response. Thanks in advance, Daofeng [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clogit R and Stata
On Jun 7, 2013, at 15:34 , Richard Beckett wrote: Sorry to once again write a message but I'm once again stumped and am having no luck finding a solution anywhere else. This question requires some finesse in both R and STATA so hopefully I will be able to get an answer here. I am much more adept in R and am trying to replicate the results of a STATA file in R. Hopefully this is a proper forum for such questions. This is the code for the clogit in STATA clogit sftpcons sftptv2a3 sftptv2a4 sftptv2a5 sftptv2a2 sftptv2a6 logim maccat disp4cat if sample==1 glb_ind==Y, group(stratida) and I tried to replicate it using clogit1-clogit(sftpcons~sftptv2a3+sftptv2a4+sftptv2a5+sftptv2a2+sftptv2a6+logim+maccat+disp4cat+strata(stratida), dframe, sample==1 | glb_ind==Y) but got different results What did I do wrong here? I interpreted the STATA clogit as run this logit as long as the sample is 1 and glb_ind=Y What should I be doing instead? An rather than | in the R version might help. Other than that, we're a bit short on clues unless you provide some output. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.nb error
Hi, On Fri, Jun 7, 2013 at 10:44 AM, Daofeng Li lid...@gmail.com wrote: Dear R Community, I have encountered a problem while using the R function glm.nb. The code that produce the error was following two lines: group=c(1,1,1,1,0,0,0,0) fit=glm.nb(y~group) While the y contains 8 sets of number like: gene2750 1 0 0 1 5 1 0 Error message: Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: There were 50 or more warnings (use warnings() to see the first 50) Execution halted I'd assume there is a missing value somewhere. But we really need a reproducible example to be certain. You could use dput(head(yourdata, 20)) to give us something to work with (as long as that much of your data throws the error). At the very least, we probably also need str(y) str(group) Are you certain there are no missing values in your data? The problem may not be that obvious, but that's an easy place to start. Sarah Information of my system: sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Does anyone happen to have some hit on how to solve this? Appreciate for any response. Thanks in advance, Daofeng -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.nb error
On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote: Dear R Community, I have encountered a problem while using the R function glm.nb. The code that produce the error was following two lines: group=c(1,1,1,1,0,0,0,0) fit=glm.nb(y~group) While the y contains 8 sets of number like: gene2750 1 0 0 1 5 1 0 Error message: Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: There were 50 or more warnings (use warnings() to see the first 50) Execution halted Information of my system: sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Does anyone happen to have some hit on how to solve this? Appreciate for any response. Thanks in advance, Daofeng There is something wrong with your actual 'y' or 'group' that is not evident from the above info: group - c(1, 1, 1, 1, 0, 0, 0, 0) y - c(0, 1, 0, 0, 1, 5, 1, 0) require(MASS) Loading required package: MASS glm.nb(y ~ group) Call: glm.nb(formula = y ~ group, init.theta = 1.711564307, link = log) Coefficients: (Intercept)group 0.5596 -1.9459 Degrees of Freedom: 7 Total (i.e. Null); 6 Residual Null Deviance: 10.23 Residual Deviance: 6.848AIC: 25.25 Check str(y) and str(group) You should also be sure to note in your posts when you are using a function from a non-base package, in this case MASS, which is not indicated in your sessionInfo() above, so something is amiss there as well. Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matched samples, dataframe, panel data
Hi, May be this helps: lst1-split(final3,list(final3$year,final3$industry)) lst2-lst1[lapply(lst1,nrow)0] lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y (x$dimension+x$dimension*0.1)) (y (x$dimension-x$dimension*0.1)),])) lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2]) lst5-lapply(lst4,function(x)x[!duplicated(x)]) lst6-lst5[lapply(lst5,length)0] names(lst6) # [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30 #[8] 2001.40 2002.40 2003.40 2004.40 lst6[2000.20] #$`2000.20` #$`2000.20`[[1]] # firm year industry dummy dimension #1 1 2000 20 0 2120 #21 5 2000 20 1 2189 # #$`2000.20`[[2]] # firm year industry dummy dimension #16 4 2000 20 0 3178 #31 7 2000 20 1 3245 # #$`2000.20`[[3]] # firm year industry dummy dimension #11 3 2000 20 1 4532 #6 2 2000 20 0 4890 A.K. From: Cecilia Carmo cecilia.ca...@ua.pt To: r-help@r-project.org r-help@r-project.org Cc: smartpink...@yahoo.com smartpink...@yahoo.com Sent: Friday, June 7, 2013 9:56 AM Subject: Re: [R] matched samples, dataframe, panel data Again my problem, better explained. #I have a data panel of thousands of firms, by year and industry and #one dummy variable that identifies one kind of firms (1 if the firm have an auditor; 0 if not) #and another variable the represents the firm dimension (total assets in thousand of euros) #I need to create two separated samples with the same number os firms where #one firm in the first have a corresponding firm in the second with the same #year, industry and dimension (the dimension doesn't need to be exatly the #same, it could vary in an interval of +/- 10%, for example) #My reproducible example firm1-sort(rep(1:10,5),decreasing=F) year1-rep(2000:2004,10) industry1-rep(20,50) dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1) dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789, 2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824, 23,2897,3456,7690,6022,3678,9431,2890) data1-data.frame(firm1,year1,industry1,dummy1,dimension1) data1 colnames(data1)-c(firm,year,industry,dummy,dimension) firm2-sort(rep(11:15,3),decreasing=F) year2-rep(2001:2003,5) industry2-rep(30,15) dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1) dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345) data2-data.frame(firm2,year2,industry2,dummy2,dimension2) data2 colnames(data2)-c(firm,year,industry,dummy,dimension) firm3-sort(rep(16:20,4),decreasing=F) year3-rep(2001:2004,5) industry3-rep(40,20) dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0) dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824, 23,2897) data3-data.frame(firm3,year3,industry3,dummy3,dimension3) data3 colnames(data3)-c(firm,year,industry,dummy,dimension) final1-rbind(data1,data2) final2-rbind(final1,data3) final2 final3-final2[order(final2$year,final2$industry,final2$dimension),] final3 #So my data is final3 is like this: firm year industry dummy dimension 26 6 2000 20 0 781 1 1 2000 20 0 2120 21 5 2000 20 1 2189 36 8 2000 20 1 2765 16 4 2000 20 0 3178 31 7 2000 20 1 3245 11 3 2000 20 1 4532 6 2 2000 20 0 4890 41 9 2000 20 0 5438 46 10 2000 20 0 7690 2 1 2001 20 0 345 37 8 2001 20 1 389 32 7 2001 20 0 1234 17 4 2001 20 0 2678 7 2 2001 20 1 2789 22 5 2001 20 1 4289 47 10 2001 20 0 6022 12 3 2001 20 1 6593 27 6 2001 20 0 35489 42 9 2001 20 1 37824 60 14 2001 30 1 2341 54 12 2001 30 0 2345 57 13 2001 30 1 3245 51 11 2001 30 0 12456 63 15 2001 30 1 12900 78 19 2001 40 1 389 74 18 2001 40 1 1288 82 20 2001 40 0 6438 70 17 2001 40 1 6754 66 16 2001 40 0 23456 43 9 2002 20 0 23 33 7 2002 20 1 25 3 1 2002 20 1 2341 28 6 2002 20 0 2345 8 2 2002 20 1 3412 48 10 2002 20 1 3678 18 4 2002 20 0 23 5 2002 20 0 8543 13 3 2002 20 0 12900 38 8 2002 20 1 23456 64 15 2002 30 0 123 52 11 2002 30 0 781 58 13 2002 30 1 2120 61 14 2002 30 1 5678 55 12 2002 30 0
[R] Horizontal labels on a plot
Hi, I have a plot with horizontal lables on the x-axis. For example: Devonian volcanic rocks n=2 with n=2 on a new line. How do I centre n=2 under the Devonian volcanic rocks label? This was my code: text(axis_text, par(usr)[three], labels = paste(LABELS, \n , n =, t(t(name.count[,two]))), srt = txt_di, adj = txtadj, xpd = TRUE, cex=txt_cex) where LABELS=Devonian volcanic rocks and name.count[,two] =2 Thanks -- Shane [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matched samples, dataframe, panel data
Hi, Not sure if this is what you wanted. res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) # this combines the list of lists to a data.frame res[1:4,] # firm year industry dummy dimension #1 1 2000 20 0 2120 #2 5 2000 20 1 2189 #3 4 2000 20 0 3178 #4 7 2000 20 1 3245 #or res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res)) row.names(res)-1:nrow(res) res[1:4,] # firm year industry dummy dimension group #1 1 2000 20 0 2120 2000.20 #1 group #2 5 2000 20 1 2189 2000.20 #1 #3 4 2000 20 0 3178 2000.20 #2 #4 7 2000 20 1 3245 2000.20 #2 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: Sent: Friday, June 7, 2013 11:33 AM Subject: RE: [R] matched samples, dataframe, panel data Thank you very much. Just a little thing: how can I put it like a dataframe? Thanks, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:27 Para: Cecilia Carmo Assunto: Re: [R] matched samples, dataframe, panel data Hi, There could be easier ways... I am a bit busy now to try other ways. - Original Message - From: arun smartpink...@yahoo.com To: Cecilia Carmo cecilia.ca...@ua.pt Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 11:25 AM Subject: Re: [R] matched samples, dataframe, panel data Hi, May be this helps: lst1-split(final3,list(final3$year,final3$industry)) lst2-lst1[lapply(lst1,nrow)0] lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y (x$dimension+x$dimension*0.1)) (y (x$dimension-x$dimension*0.1)),])) lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2]) lst5-lapply(lst4,function(x)x[!duplicated(x)]) lst6-lst5[lapply(lst5,length)0] names(lst6) # [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30 #[8] 2001.40 2002.40 2003.40 2004.40 lst6[2000.20] #$`2000.20` #$`2000.20`[[1]] # firm year industry dummy dimension #1 1 2000 20 0 2120 #21 5 2000 20 1 2189 # #$`2000.20`[[2]] # firm year industry dummy dimension #16 4 2000 20 0 3178 #31 7 2000 20 1 3245 # #$`2000.20`[[3]] # firm year industry dummy dimension #11 3 2000 20 1 4532 #6 2 2000 20 0 4890 A.K. From: Cecilia Carmo cecilia.ca...@ua.pt To: r-help@r-project.org r-help@r-project.org Cc: smartpink...@yahoo.com smartpink...@yahoo.com Sent: Friday, June 7, 2013 9:56 AM Subject: Re: [R] matched samples, dataframe, panel data Again my problem, better explained. #I have a data panel of thousands of firms, by year and industry and #one dummy variable that identifies one kind of firms (1 if the firm have an auditor; 0 if not) #and another variable the represents the firm dimension (total assets in thousand of euros) #I need to create two separated samples with the same number os firms where #one firm in the first have a corresponding firm in the second with the same #year, industry and dimension (the dimension doesn't need to be exatly the #same, it could vary in an interval of +/- 10%, for example) #My reproducible example firm1-sort(rep(1:10,5),decreasing=F) year1-rep(2000:2004,10) industry1-rep(20,50) dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1) dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789, 2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824, 23,2897,3456,7690,6022,3678,9431,2890) data1-data.frame(firm1,year1,industry1,dummy1,dimension1) data1 colnames(data1)-c(firm,year,industry,dummy,dimension) firm2-sort(rep(11:15,3),decreasing=F) year2-rep(2001:2003,5) industry2-rep(30,15) dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1) dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345) data2-data.frame(firm2,year2,industry2,dummy2,dimension2) data2 colnames(data2)-c(firm,year,industry,dummy,dimension) firm3-sort(rep(16:20,4),decreasing=F) year3-rep(2001:2004,5) industry3-rep(40,20) dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0) dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824, 23,2897) data3-data.frame(firm3,year3,industry3,dummy3,dimension3) data3 colnames(data3)-c(firm,year,industry,dummy,dimension) final1-rbind(data1,data2) final2-rbind(final1,data3) final2 final3-final2[order(final2$year,final2$industry,final2$dimension),] final3 #So my data is final3 is like this: firm year industry dummy dimension 26 6 2000 20 0 781 1 1 2000 20 0 2120 21 5 2000 20 1 2189
Re: [R] matched samples, dataframe, panel data
That's it, the first one! Thank you very much, Cecília Carmo De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:43 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data Hi, Not sure if this is what you wanted. res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) # this combines the list of lists to a data.frame res[1:4,] # firm year industry dummy dimension #11 2000 20 0 2120 #25 2000 20 1 2189 #34 2000 20 0 3178 #47 2000 20 1 3245 #or res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res)) row.names(res)-1:nrow(res) res[1:4,] # firm year industry dummy dimension group #11 2000 20 0 2120 2000.20 #1 group #25 2000 20 1 2189 2000.20 #1 #34 2000 20 0 3178 2000.20 #2 #47 2000 20 1 3245 2000.20 #2 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: Sent: Friday, June 7, 2013 11:33 AM Subject: RE: [R] matched samples, dataframe, panel data Thank you very much. Just a little thing: how can I put it like a dataframe? Thanks, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:27 Para: Cecilia Carmo Assunto: Re: [R] matched samples, dataframe, panel data Hi, There could be easier ways... I am a bit busy now to try other ways. - Original Message - From: arun smartpink...@yahoo.com To: Cecilia Carmo cecilia.ca...@ua.pt Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 11:25 AM Subject: Re: [R] matched samples, dataframe, panel data Hi, May be this helps: lst1-split(final3,list(final3$year,final3$industry)) lst2-lst1[lapply(lst1,nrow)0] lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y (x$dimension+x$dimension*0.1)) (y (x$dimension-x$dimension*0.1)),])) lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2]) lst5-lapply(lst4,function(x)x[!duplicated(x)]) lst6-lst5[lapply(lst5,length)0] names(lst6) # [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30 #[8] 2001.40 2002.40 2003.40 2004.40 lst6[2000.20] #$`2000.20` #$`2000.20`[[1]] # firm year industry dummy dimension #1 1 2000 20 0 2120 #215 2000 20 1 2189 # #$`2000.20`[[2]] # firm year industry dummy dimension #164 2000 20 0 3178 #317 2000 20 1 3245 # #$`2000.20`[[3]] # firm year industry dummy dimension #113 2000 20 1 4532 #6 2 2000 20 0 4890 A.K. From: Cecilia Carmo cecilia.ca...@ua.pt To: r-help@r-project.org r-help@r-project.org Sent: Friday, June 7, 2013 9:56 AM Subject: Re: [R] matched samples, dataframe, panel data Again my problem, better explained. #I have a data panel of thousands of firms, by year and industry and #one dummy variable that identifies one kind of firms (1 if the firm have an auditor; 0 if not) #and another variable the represents the firm dimension (total assets in thousand of euros) #I need to create two separated samples with the same number os firms where #one firm in the first have a corresponding firm in the second with the same #year, industry and dimension (the dimension doesn't need to be exatly the #same, it could vary in an interval of +/- 10%, for example) #My reproducible example firm1-sort(rep(1:10,5),decreasing=F) year1-rep(2000:2004,10) industry1-rep(20,50) dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1) dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789, 2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824, 23,2897,3456,7690,6022,3678,9431,2890) data1-data.frame(firm1,year1,industry1,dummy1,dimension1) data1 colnames(data1)-c(firm,year,industry,dummy,dimension) firm2-sort(rep(11:15,3),decreasing=F) year2-rep(2001:2003,5) industry2-rep(30,15) dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1) dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345) data2-data.frame(firm2,year2,industry2,dummy2,dimension2) data2 colnames(data2)-c(firm,year,industry,dummy,dimension) firm3-sort(rep(16:20,4),decreasing=F) year3-rep(2001:2004,5) industry3-rep(40,20) dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0) dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824, 23,2897) data3-data.frame(firm3,year3,industry3,dummy3,dimension3) data3 colnames(data3)-c(firm,year,industry,dummy,dimension) final1-rbind(data1,data2) final2-rbind(final1,data3) final2
Re: [R] glm.nb error
On Jun 7, 2013, at 10:36 AM, Daofeng Li lid...@gmail.com wrote: Thank you Sarah and Marc for your fast and nice response. Apology for didn't include all information. I have a input file like following: gene1 18 15 13 13 16 9 20 24 gene2 15 8 8 7 0 12 18 4 gene3 10 9 8 12 9 11 12 12 gene4 4 0 4 3 0 5 0 0 gene5 0 1 0 0 1 5 1 0 gene6 3 3 3 3 4 4 4 4 gene7 0 4 0 2 2 2 0 0 gene8 4 4 7 3 0 6 6 12 gene9 11 6 13 10 13 7 12 9 gene106 3 6 3 4 7 6 3 I am using following R code to compare the difference between the 1st 4 numbers against 2nd 4 numbers: library(MASS) library(coin) suppressPackageStartupMessages(suppressWarnings(library(tcltk))) library(qvalue) library(plyr) dat = read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4)) index=(apply(dat[,-1],1,sum)0) data = dat[index,] group=c(1,1,1,1,0,0,0,0) n=nrow(data) result=NULL for (i in 1:n){ print(i) y=as.numeric(data[i,-1]) if (all((y-mean(y))==0)) result=rbind(result,c(0,0,0,1)) else { fit=glm.nb(y~group) result=rbind(result,summary(fit)$coef[2,]) } } qval = qvalue(result[,4]) fdr=0.1 index=(qval$qvaluesfdr) dat.result = data[index,] write.table(dat.result,file=test_result,row.names=F,quote=F) If you use this input file and code, would reproduce the same error: Loading required package: methods Loading required package: survival Loading required package: splines Loading required package: mvtnorm Loading required package: modeltools Loading required package: stats4 Attaching package: ‘plyr’ The following object is masked from ‘package:modeltools’: empty [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: Warning messages: 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached 2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached Execution halted So might be the error was in 6th line, not the line I pasted before (5th line)? Sorry about that. Thanks. Daofeng Your 'y' at that point in the loop is: y [1] 3 3 3 3 4 4 4 4 and 'group' is: group [1] 1 1 1 1 0 0 0 0 glm.nb(y ~ group) Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Think about it... :-) Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.nb error
Hi, On Fri, Jun 7, 2013 at 11:36 AM, Daofeng Li lid...@gmail.com wrote: Thank you Sarah and Marc for your fast and nice response. Apology for didn't include all information. I have a input file like following: gene1 18 15 13 13 16 9 20 24 gene2 15 8 8 7 0 12 18 4 gene3 10 9 8 12 9 11 12 12 gene4 4 0 4 3 0 5 0 0 gene5 0 1 0 0 1 5 1 0 gene6 3 3 3 3 4 4 4 4 gene7 0 4 0 2 2 2 0 0 gene8 4 4 7 3 0 6 6 12 gene9 11 6 13 10 13 7 12 9 gene10 6 3 6 3 4 7 6 3 dat = read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4)) Is this what's in the test file that your code reads in? We don't have that file, so can't run your code. If it is, then the output of dput(dat) would be enormously more useful than copy and pasting your data file into your email. And yes, the full code is very little like the pair of lines you originally pasted in. Sarah I am using following R code to compare the difference between the 1st 4 numbers against 2nd 4 numbers: library(MASS) library(coin) suppressPackageStartupMessages(suppressWarnings(library(tcltk))) library(qvalue) library(plyr) dat = read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4)) index=(apply(dat[,-1],1,sum)0) data = dat[index,] group=c(1,1,1,1,0,0,0,0) n=nrow(data) result=NULL for (i in 1:n){ print(i) y=as.numeric(data[i,-1]) if (all((y-mean(y))==0)) result=rbind(result,c(0,0,0,1)) else { fit=glm.nb(y~group) result=rbind(result,summary(fit)$coef[2,]) } } qval = qvalue(result[,4]) fdr=0.1 index=(qval$qvaluesfdr) dat.result = data[index,] write.table(dat.result,file=test_result,row.names=F,quote=F) If you use this input file and code, would reproduce the same error: Loading required package: methods Loading required package: survival Loading required package: splines Loading required package: mvtnorm Loading required package: modeltools Loading required package: stats4 Attaching package: ‘plyr’ The following object is masked from ‘package:modeltools’: empty [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: Warning messages: 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached 2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached Execution halted So might be the error was in 6th line, not the line I pasted before (5th line)? Sorry about that. Thanks. Daofeng On Fri, Jun 7, 2013 at 10:15 AM, Marc Schwartz marc_schwa...@me.com wrote: On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote: Dear R Community, I have encountered a problem while using the R function glm.nb. The code that produce the error was following two lines: group=c(1,1,1,1,0,0,0,0) fit=glm.nb(y~group) While the y contains 8 sets of number like: gene2750 1 0 0 1 5 1 0 Error message: Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: There were 50 or more warnings (use warnings() to see the first 50) Execution halted Information of my system: sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Does anyone happen to have some hit on how to solve this? Appreciate for any response. Thanks in advance, Daofeng There is something wrong with your actual 'y' or 'group' that is not evident from the above info: group - c(1, 1, 1, 1, 0, 0, 0, 0) y - c(0, 1, 0, 0, 1, 5, 1, 0) require(MASS) Loading required package: MASS glm.nb(y ~ group) Call: glm.nb(formula = y ~ group, init.theta = 1.711564307, link = log) Coefficients: (Intercept)group 0.5596 -1.9459 Degrees of Freedom: 7 Total (i.e. Null); 6 Residual Null Deviance: 10.23 Residual Deviance: 6.848AIC: 25.25 Check str(y) and str(group) You should also be sure to note in your posts when you are using a function from a non-base package, in this case MASS, which is not indicated in your sessionInfo() above, so something is amiss there as well. Regards, Marc Schwartz -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Conditional coloring of area between curves
that's exactly what I was looking for! That code has some problems if your time series do not cross at the time points. E.g., Duncan's code with some diagnostic plotting is: f0 - function (site1, site2, x = time(site1), check = FALSE) { stopifnot(length(x) == length(site1), length(x) == length(site2), all(diff(x) 0)) if (check) { oldMfrow - par(mfrow = c(2, 1)) on.exit(par(oldMfrow)) matplot(x, cbind(site1, site2), type = b) } smaller - pmin(site1, site2) plot(x, site1, ylim = range(c(site1, site2)), type = n) polygon(c(x, rev(x)), c(smaller, rev(site1)), col = red) polygon(c(x, rev(x)), c(smaller, rev(site2)), col = blue) } # See how it messes the crossings in f0(c(0, 1, -1, 0), c(1, 0, -2, 1), check=TRUE) The following is a quick and dirty way of adding all the crossing points to the original data sets so the plot is right. addCrossingPoints - function (y1, y2, x) { stopifnot(length(y1)==length(x), length(y2)==length(x), all(diff(x)0)) # isCrossingSegment[i]==TRUE means that y1 and y2 cross between #x[i] and x[i+1] (strictly, not at x[i] or x[i+1]). i - seq_len(length(y1) - 1) isCrossingSegment - ((y1[i] y2[i]) (y1[i+1] y2[i+1])) | ((y1[i] y2[i]) (y1[i+1] y2[i+1])) if (any(isCrossingSegment)) { i - which(isCrossingSegment) dx - x[i+1] - x[i] m1 - (y1[i+1] - y1[i]) / dx m2 - (y2[i+1] - y2[i]) / dx newDX - (y1[i] - y2[i]) / (m2 - m1) newY - y1[i] + newDX * m1 o - order( x - c(x, x[i] + newDX)) x - x[o] y1 - c(y1, newY)[o] y2 - c(y2, newY)[o] } list(y1=y1, y2=y2, x=x) } # Combine it with Duncan's code to get a better looking plot f1 - function(site1, site2, x=time(site1), check = FALSE) { tmp - addCrossingPoints(site1, site2, x) f0(tmp$y1, tmp$y2, tmp$x, check=check) } f1(c(0, 1, -1, 0), c(1, 0, -2, 1), check=TRUE) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Roland Pape Sent: Friday, June 07, 2013 2:51 AM To: Duncan Murdoch Cc: r-help@r-project.org Subject: Re: [R] Conditional coloring of area between curves Hi Duncan, that's exactly what I was looking for! The series have common times, so that's no problem. Thanks a lot! Roland Von: Duncan Murdoch [murdoch.dun...@gmail.com] Gesendet: Donnerstag, 6. Juni 2013 18:45 An: Roland Pape Cc: r-help@r-project.org Betreff: Re: [R] Conditional coloring of area between curves On 06/06/2013 10:41 AM, Roland Pape wrote: Dear list, I have two time series of temperatures from different sites plotted in the same diagram and would like to color the area between the curves differently, dependent on whether site 1 is cooler than site 2 (colored let's say in blue) or warmer (red). While polygone() works fine to color the enclosed area in general, I'm struggling with this conditional coloring. Any help is greatly appreciated! Suppose the series are named site1 and site2, and they have common times in a variable times. Then the following should do what you want: smaller - pmin(site1, site2) plot(x, site1, ylim = range(c(site1, site2)), type=n) polygon(c(x, rev(x)), c(smaller, rev(site1)), col=red) polygon(c(x, rev(x)), c(smaller, rev(site2)), col=blue) If the times for the two series are different it's a little harder; first you need to give them common times, and that will depend on how you decide to evaluate the values between observations. Probably linear interpolation (using approx()) is fine, but it's up to you. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matched samples, dataframe, panel data
Sorry, Something is not ok, because when I do nrow(subset(res,res$dummy==0)) [1] 22 nrow(subset(res,res$dummy==1)) [1] 24 There the number o observatios in the two subsets is not equal. Thanks again, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:43 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data Hi, Not sure if this is what you wanted. res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) # this combines the list of lists to a data.frame res[1:4,] # firm year industry dummy dimension #11 2000 20 0 2120 #25 2000 20 1 2189 #34 2000 20 0 3178 #47 2000 20 1 3245 #or res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res)) row.names(res)-1:nrow(res) res[1:4,] # firm year industry dummy dimension group #11 2000 20 0 2120 2000.20 #1 group #25 2000 20 1 2189 2000.20 #1 #34 2000 20 0 3178 2000.20 #2 #47 2000 20 1 3245 2000.20 #2 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: Sent: Friday, June 7, 2013 11:33 AM Subject: RE: [R] matched samples, dataframe, panel data Thank you very much. Just a little thing: how can I put it like a dataframe? Thanks, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:27 Para: Cecilia Carmo Assunto: Re: [R] matched samples, dataframe, panel data Hi, There could be easier ways... I am a bit busy now to try other ways. - Original Message - From: arun smartpink...@yahoo.com To: Cecilia Carmo cecilia.ca...@ua.pt Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 11:25 AM Subject: Re: [R] matched samples, dataframe, panel data Hi, May be this helps: lst1-split(final3,list(final3$year,final3$industry)) lst2-lst1[lapply(lst1,nrow)0] lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y (x$dimension+x$dimension*0.1)) (y (x$dimension-x$dimension*0.1)),])) lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2]) lst5-lapply(lst4,function(x)x[!duplicated(x)]) lst6-lst5[lapply(lst5,length)0] names(lst6) # [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30 #[8] 2001.40 2002.40 2003.40 2004.40 lst6[2000.20] #$`2000.20` #$`2000.20`[[1]] # firm year industry dummy dimension #1 1 2000 20 0 2120 #215 2000 20 1 2189 # #$`2000.20`[[2]] # firm year industry dummy dimension #164 2000 20 0 3178 #317 2000 20 1 3245 # #$`2000.20`[[3]] # firm year industry dummy dimension #113 2000 20 1 4532 #6 2 2000 20 0 4890 A.K. From: Cecilia Carmo cecilia.ca...@ua.pt To: r-help@r-project.org r-help@r-project.org Sent: Friday, June 7, 2013 9:56 AM Subject: Re: [R] matched samples, dataframe, panel data Again my problem, better explained. #I have a data panel of thousands of firms, by year and industry and #one dummy variable that identifies one kind of firms (1 if the firm have an auditor; 0 if not) #and another variable the represents the firm dimension (total assets in thousand of euros) #I need to create two separated samples with the same number os firms where #one firm in the first have a corresponding firm in the second with the same #year, industry and dimension (the dimension doesn't need to be exatly the #same, it could vary in an interval of +/- 10%, for example) #My reproducible example firm1-sort(rep(1:10,5),decreasing=F) year1-rep(2000:2004,10) industry1-rep(20,50) dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1) dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789, 2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824, 23,2897,3456,7690,6022,3678,9431,2890) data1-data.frame(firm1,year1,industry1,dummy1,dimension1) data1 colnames(data1)-c(firm,year,industry,dummy,dimension) firm2-sort(rep(11:15,3),decreasing=F) year2-rep(2001:2003,5) industry2-rep(30,15) dummy2-c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1) dimension2-c(12456,781,32489,2345,5754,8976,3245,2120,345,2341,5678,10900,12900,123,2345) data2-data.frame(firm2,year2,industry2,dummy2,dimension2) data2 colnames(data2)-c(firm,year,industry,dummy,dimension) firm3-sort(rep(16:20,4),decreasing=F) year3-rep(2001:2004,5) industry3-rep(40,20) dummy3-c(0,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,0,1,0,0) dimension3-c(23456,1181,32489,2345,6754,8976,3245,1234,1288,1200,2345,2765,389,23456,2367,3892,6438,24824, 23,2897) data3-data.frame(firm3,year3,industry3,dummy3,dimension3) data3
Re: [R] glm.nb error
As Marc already pointed out, take a close look at this part of your loop: R i - 6 R R y - as.numeric(data[i,-1]) R y [1] 3 3 3 3 4 4 4 4 R group [1] 1 1 1 1 0 0 0 0 R fit - glm.nb(y~group) Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed What do you expect to happen there? In general, it's a better practice to pre-specify the size of result (eg matrix(NA, nrow=n, ncol=4) ) and fill it as you go, rather than using rbind() within a loop. Much more memory-efficient. Sarah On Fri, Jun 7, 2013 at 11:58 AM, Daofeng Li lid...@gmail.com wrote: Sorry Sarah. dput(dat) structure(list(gene = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L), .Label = c(gene1, gene10, gene2, gene3, gene4, gene5, gene6, gene7, gene8, gene9), class = factor), b1 = c(18L, 15L, 10L, 4L, 0L, 3L, 0L, 4L, 11L, 6L), b2 = c(15L, 8L, 9L, 0L, 1L, 3L, 4L, 4L, 6L, 3L), b3 = c(13L, 8L, 8L, 4L, 0L, 3L, 0L, 7L, 13L, 6L), b4 = c(13L, 7L, 12L, 3L, 0L, 3L, 2L, 3L, 10L, 3L), c1 = c(16L, 0L, 9L, 0L, 1L, 4L, 2L, 0L, 13L, 4L), c2 = c(9L, 12L, 11L, 5L, 5L, 4L, 2L, 6L, 7L, 7L), c3 = c(20L, 18L, 12L, 0L, 1L, 4L, 0L, 6L, 12L, 6L), c4 = c(24L, 4L, 12L, 0L, 0L, 4L, 0L, 12L, 9L, 3L)), .Names = c(gene, b1, b2, b3, b4, c1, c2, c3, c4), class = data.frame, row.names = c(NA, -10L)) above was the dput(dat). Thanks. Daofeng On Fri, Jun 7, 2013 at 10:47 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, On Fri, Jun 7, 2013 at 11:36 AM, Daofeng Li lid...@gmail.com wrote: Thank you Sarah and Marc for your fast and nice response. Apology for didn't include all information. I have a input file like following: gene1 18 15 13 13 16 9 20 24 gene2 15 8 8 7 0 12 18 4 gene3 10 9 8 12 9 11 12 12 gene4 4 0 4 3 0 5 0 0 gene5 0 1 0 0 1 5 1 0 gene6 3 3 3 3 4 4 4 4 gene7 0 4 0 2 2 2 0 0 gene8 4 4 7 3 0 6 6 12 gene9 11 6 13 10 13 7 12 9 gene10 6 3 6 3 4 7 6 3 dat = read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4)) Is this what's in the test file that your code reads in? We don't have that file, so can't run your code. If it is, then the output of dput(dat) would be enormously more useful than copy and pasting your data file into your email. And yes, the full code is very little like the pair of lines you originally pasted in. Sarah I am using following R code to compare the difference between the 1st 4 numbers against 2nd 4 numbers: library(MASS) library(coin) suppressPackageStartupMessages(suppressWarnings(library(tcltk))) library(qvalue) library(plyr) dat = read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4)) index=(apply(dat[,-1],1,sum)0) data = dat[index,] group=c(1,1,1,1,0,0,0,0) n=nrow(data) result=NULL for (i in 1:n){ print(i) y=as.numeric(data[i,-1]) if (all((y-mean(y))==0)) result=rbind(result,c(0,0,0,1)) else { fit=glm.nb(y~group) result=rbind(result,summary(fit)$coef[2,]) } } qval = qvalue(result[,4]) fdr=0.1 index=(qval$qvaluesfdr) dat.result = data[index,] write.table(dat.result,file=test_result,row.names=F,quote=F) If you use this input file and code, would reproduce the same error: Loading required package: methods Loading required package: survival Loading required package: splines Loading required package: mvtnorm Loading required package: modeltools Loading required package: stats4 Attaching package: ‘plyr’ The following object is masked from ‘package:modeltools’: empty [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: Warning messages: 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached 2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached Execution halted So might be the error was in 6th line, not the line I pasted before (5th line)? Sorry about that. Thanks. Daofeng On Fri, Jun 7, 2013 at 10:15 AM, Marc Schwartz marc_schwa...@me.com wrote: On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote: Dear R Community, I have encountered a problem while using the R function glm.nb. The code that produce the error was following two lines: group=c(1,1,1,1,0,0,0,0) fit=glm.nb(y~group) While the y contains 8 sets of number like: gene2750 1 0 0 1 5 1 0 Error message: Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: There were 50 or more warnings (use warnings() to see the first 50) Execution halted Information of my system: sessionInfo() R version 3.0.1
Re: [R] matched samples, dataframe, panel data
Sorry, it included a case with both values for dummy were 1. lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]}) res-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) nrow(subset(res,dummy==0)) #[1] 22 nrow(subset(res,dummy==1)) #[1] 22 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 12:03 PM Subject: RE: [R] matched samples, dataframe, panel data Sorry, Something is not ok, because when I do nrow(subset(res,res$dummy==0)) [1] 22 nrow(subset(res,res$dummy==1)) [1] 24 There the number o observatios in the two subsets is not equal. Thanks again, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:43 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data Hi, Not sure if this is what you wanted. res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) # this combines the list of lists to a data.frame res[1:4,] # firm year industry dummy dimension #1 1 2000 20 0 2120 #2 5 2000 20 1 2189 #3 4 2000 20 0 3178 #4 7 2000 20 1 3245 #or res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res)) row.names(res)-1:nrow(res) res[1:4,] # firm year industry dummy dimension group #1 1 2000 20 0 2120 2000.20 #1 group #2 5 2000 20 1 2189 2000.20 #1 #3 4 2000 20 0 3178 2000.20 #2 #4 7 2000 20 1 3245 2000.20 #2 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: Sent: Friday, June 7, 2013 11:33 AM Subject: RE: [R] matched samples, dataframe, panel data Thank you very much. Just a little thing: how can I put it like a dataframe? Thanks, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:27 Para: Cecilia Carmo Assunto: Re: [R] matched samples, dataframe, panel data Hi, There could be easier ways... I am a bit busy now to try other ways. - Original Message - From: arun smartpink...@yahoo.com To: Cecilia Carmo cecilia.ca...@ua.pt Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 11:25 AM Subject: Re: [R] matched samples, dataframe, panel data Hi, May be this helps: lst1-split(final3,list(final3$year,final3$industry)) lst2-lst1[lapply(lst1,nrow)0] lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y (x$dimension+x$dimension*0.1)) (y (x$dimension-x$dimension*0.1)),])) lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2]) lst5-lapply(lst4,function(x)x[!duplicated(x)]) lst6-lst5[lapply(lst5,length)0] names(lst6) # [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30 #[8] 2001.40 2002.40 2003.40 2004.40 lst6[2000.20] #$`2000.20` #$`2000.20`[[1]] # firm year industry dummy dimension #1 1 2000 20 0 2120 #21 5 2000 20 1 2189 # #$`2000.20`[[2]] # firm year industry dummy dimension #16 4 2000 20 0 3178 #31 7 2000 20 1 3245 # #$`2000.20`[[3]] # firm year industry dummy dimension #11 3 2000 20 1 4532 #6 2 2000 20 0 4890 A.K. From: Cecilia Carmo cecilia.ca...@ua.pt To: r-help@r-project.org r-help@r-project.org Cc: smartpink...@yahoo.com smartpink...@yahoo.com Sent: Friday, June 7, 2013 9:56 AM Subject: Re: [R] matched samples, dataframe, panel data Again my problem, better explained. #I have a data panel of thousands of firms, by year and industry and #one dummy variable that identifies one kind of firms (1 if the firm have an auditor; 0 if not) #and another variable the represents the firm dimension (total assets in thousand of euros) #I need to create two separated samples with the same number os firms where #one firm in the first have a corresponding firm in the second with the same #year, industry and dimension (the dimension doesn't need to be exatly the #same, it could vary in an interval of +/- 10%, for example) #My reproducible example firm1-sort(rep(1:10,5),decreasing=F) year1-rep(2000:2004,10) industry1-rep(20,50) dummy1-c(0,0,1,1,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1) dimension1-c(2120,345,2341,5678,10900,4890,2789,3412,9500,8765,4532,6593,12900,123,2345,3178,2678,,647,23789, 2189,4289,8543,637,23456,781,35489,2345,5754,8976,3245,1234,25,1200,2345,2765,389,23456,2367,3892,5438,37824, 23,2897,3456,7690,6022,3678,9431,2890) data1-data.frame(firm1,year1,industry1,dummy1,dimension1) data1 colnames(data1)-c(firm,year,industry,dummy,dimension) firm2-sort(rep(11:15,3),decreasing=F)
[R] coxme p-values
Hi there does anybody know if the p-values in coxme are given one- or two-tailed by default? Thank you David -Messaggio originale- Da: r-help-boun...@r-project.org per conto di r-help-requ...@r-project.org Inviato: gio 06/06/2013 11.00 A: r-help@r-project.org Oggetto: R-help Digest, Vol 124, Issue 7 Send R-help mailing list submissions to r-help@r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/r-help or, via email, send a message with subject or body 'help' to r-help-requ...@r-project.org You can reach the person managing the list at r-help-ow...@r-project.org When replying, please edit your Subject line so it is more specific than Re: Contents of R-help digest... Today's Topics: 1. Re: How to display multiples lines with different color on the same plot? (Rui Barradas) 2. Re: Post hoc power analysis for mixed-effects models (David Winsemius) 3. Re: how to compute maximum of fitted polynomial? (David Winsemius) 4. Re: how to compute maximum of fitted polynomial? (Hans W Borchers) 5. Re: How to display multiples lines with different color on the same plot? (David Winsemius) 6. Re: read.csv and write.csv filtering for very big data ? (Duncan Murdoch) 7. Re: Survival aareg problem (Terry Therneau) 8. Re: Post hoc power analysis for mixed-effects models (Ben Bolker) 9. SPlus script (Scott Raynaud) 10. Re: dates and time series management (arun) 11. Re: Loop FOR with histogram() from lattice (Jeff Newmiller) 12. Re: Lattice, ggplot, and pointsize (Deepayan Sarkar) 13. Re: how to compute maximum of fitted polynomial? (Joseph Clark) 14. Re: How to write a loop in R to select multiple regression model and validate it ? (beginner) 15. Coefficients paths - comparison of ridge, lasso and elastic net regression (beginner) 16. sample {base} (Xochitl CORMON) 17. Re: split and common variables (Nico Met) 18. Re: Trying to build up functions with its names by means of lapply (Julio Sergio) 19. Re: read.csv and write.csv filtering for very big data ? (Duncan Murdoch) 20. reshape2 issue (Neotropical bat risk assessments) 21. Re: sample {base} (Sarah Goslee) 22. Re: sample {base} (David Carlson) 23. Re: SPlus script (Pascal Oettli) 24. Re: reshape2 issue (Pascal Oettli) 25. Re: split and common variables (arun) 26. Re: sample {base} (Xochitl CORMON) 27. reshape2 issue continued (Neotropical bat risk assessments) 28. Re: dates and time series management (arun) 29. Re: split and common variables (David Carlson) 30. Re: reshape2 issue continued (Thomas Adams) 31. Re: reshape2 issue continued (Ista Zahn) 32. Re: Trying to build up functions with its names by means of lapply (Bert Gunter) 33. Re: plyr _aply simplifying return-value dimensions (Adams, Jean) 34. Re: sample {base} (arun) 35. Re: SPlus script (Scott Raynaud) 36. Re: reshape2 issue continued (John Kane) 37. Re: SPlus script (peter dalgaard) 38. Re: SPlus script (William Dunlap) 39. reshape2 issue solved Tnx! (Neotropical bat risk assessments) 40. Re: How to display multiples lines with different color on the same plot? (John Kane) 41. Re: reshape2 issue solved Tnx! (John Kane) 42. Re: Trying to build up functions with its names by means of lapply (Julio Sergio) 43. Re: Trying to build up functions with its names by means of lapply (William Dunlap) 44. Re: Lattice, ggplot, and pointsize (Milan Bouchet-Valat) 45. Re: SPlus script (Scott Raynaud) 46. Re: Trying to build up functions with its names by means of lapply (Julio Sergio) 47. Re: probability weights in multilevel models (Thomas Lumley) 48. Re: Trying to build up functions with its names bymeans of lapply (William Dunlap) 49. multilevel binary and ordered regression models (Xu Jun) 50. Re: dates and time series management (arun) 51. rJava is not loading (Dimitri Liakhovitski) 52. Map Antarctica (ejb) 53. List into table (Andtrei89) 54. sandwich matrix in gamm (Isabel Proen?a) 55. R error- more columns than column names (Sandro Magalh?es) 56. Subsetting out missing values for a certain variable (Daniel Tucker) 57. Re: Subsetting out missing values for a certain variable (Daniel Tucker) 58. Re: Subsetting out missing values for a certain variable (Daniel Tucker) 59. Re: read.csv and write.csv filtering for very big data ? (ivo welch) 60. reshape2 issue (Bruce Miller) 61. Re: installing package 'rqpd' (Regression quantiles for panel data) (Manish K. Srivastava) 62. Temporally autocorrelated Poisson data (Ali Swanson) 63. Error in anova.cca by=axis after Vegan update (Edwards, Danielle) 64. combining two different matrizes (ThomasH) 65. Re: Subsetting out missing values for a certain variable (Jorge I Velez) 66. Re: Map
[R] Trying to install NADA in R 3.0.0
Hello folks, Im trying to install the NADA package in R 3.0.0 It has been archived so I tried to downloading it and installing it locally. I get utils:::menuInstallLocal() package NADA successfully unpacked and MD5 sums checked But then when I load it I get library(NADA) Error in library(NADA) : NADA is not a valid installed package Any sugestions [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Distance-based non-parametric ANOVA
Yes. vegan and TreMineR. Thank you! On Friday, June 07, 2013 10:52:02 Jose Iparraguirre wrote: Hi Mikhail, After a cursory online search using simply A new method for non-parametric multivariate analysis + CRAN + R, I got the following three packages which include this paper in their references: vegan, mefa, and TraMineR. I don't know whether they deal with the procedure you are after, but it would be worth having a look. Regards, José Prof. José Iparraguirre Chief Economist Age UK Age UK Tavis House, 1- 6 Tavistock Square London, WC1H 9NB -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Mikhail Umorin Sent: 06 June 2013 21:35 To: r-help@r-project.org Subject: [R] Distance-based non-parametric ANOVA Hello, I am comparing treatments by comparing within group to between group distances like described in MJ Anderson. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology 26: 32 -- 46. The idea is to find the ratio of within group sum of distance^2 to between group sum of distance^2. Then find the distribution of the statistic by permuting the sample. Has anyone done this before in R code? Are there any packages (I did checked on CRAN - none). Code? I want to make sure I won't be inventing a bicycle before I write my own package. Comments? Suggestions? (I have never written an R package before) Thank you for your time, Mikhail. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Age UK Improving later life www.ageuk.org.uk --- Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798). Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited, Age UK is an Introducer Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth Access for the purposes of introducing potential annuity and health cash plans customers respectively. Age UK Enterprises Limited, JLT Benefit Solutions Limited and Simplyhealth Access are all authorised and regulated by the Financial Services Authority. -- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you receive a message in error, please advise the sender and delete immediately. Except where this email is sent in the usual course of our business, any opinions expressed in this email are those of the author and do not necessarily reflect the opinions of Age UK or its subsidiaries and associated companies. Age UK monitors all e-mail transmissions passing through its network and may block or modify mails which are deemed to be unsuitable. Age Concern England (charity number 261794) and Help the Aged (charity number 272786) and their trading and other associated companies merged on 1st April 2009. Together they have formed the Age UK Group, dedicated to improving the lives of people in later life. The three national Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help the Aged in these nations to form three registered charities: Age Scotland, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clogit R and Stata
From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Friday, June 7, 2013 11:12 AM Subject: Re: [R] Clogit R and Stata Here is the R output: Call: coxph(formula = Surv(rep(1, 1404L), sftpcons) ~ sftptv2a3 + sftptv2a4 +   sftptv2a5 + sftptv2a2 + sftptv2a6 + logim + maccat + disp4cat +   strata(stratida), data = dframe, method = exact)  n= 1404, number of events= 351        coef      exp(coef) se(coef)    z Pr(|z|)   sftptv2a3  1.4552   4.2852  0.2273  6.401 1.54e-10 *** sftptv2a4  3.1118  22.4609  0.2265 13.739  2e-16 *** sftptv2a5  1.0717   2.9204  0.2522  4.249 2.15e-05 *** sftptv2a2  0.7185   2.0514  0.3300  2.177  0.0295 *  sftptv2a6  2.7341  15.3965  0.5050  5.414 6.17e-08 *** logim    0.7579   2.1338  0.1347  5.625 1.85e-08 *** maccat   3.0809  21.7771  0.4005  7.693 1.43e-14 *** disp4cat  0.7061   2.0261  0.1524  4.634 3.59e-06 *** --- Signif. codes:  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1           exp(coef) exp(-coef) lower .95 upper .95 sftptv2a3   4.285   0.23336   2.745   6.691 sftptv2a4   22.461   0.04452   14.409   35.013 sftptv2a5   2.920   0.34241   1.781   4.788 sftptv2a2   2.051   0.48747   1.074   3.917 sftptv2a6   15.397   0.06495   5.722   41.429 logim     2.134   0.46866   1.639   2.779 maccat    21.777   0.04592   9.934   47.739 disp4cat    2.026   0.49355   1.503   2.731 Rsquare= 0.239  (max possible= 0.623 ) Likelihood ratio test= 383.2  on 8 df,  p=0 Wald test       = 264.7  on 8 df,  p=0 Score (logrank) test = 396.2  on 8 df,  p=0 And the STATA output: Iteration 0:  log likelihood = -95.537697  Iteration 1:  log likelihood = -91.465581  Iteration 2:  log likelihood = -91.402366  Iteration 3:  log likelihood = -91.402264  Iteration 4:  log likelihood = -91.402264  Conditional (fixed-effects) logistic regression  Number of obs  =     468                          LR chi2(8)    =   141.59                          Prob chi2   =   0. Log likelihood = -91.402264            Pseudo R2    =   0.4365 --   sftpcons |    Coef.  Std. Err.    z   P|z|   [95% Conf. Interval] -+   sftptv2a3 |  2.042827  .4741327   4.31  0.000   1.113544   2.97211   sftptv2a4 |   4.10828  .5593723   7.34  0.000    3.01193   5.204629   sftptv2a5 |  1.766492  .5585173   3.16  0.002   .6718177   2.861165   sftptv2a2 |  1.366568  .6540307   2.09  0.037    .084691   2.648444   sftptv2a6 |  2.307152  .8225835   2.80  0.005   .6949178   3.919386     logim |  1.404135  .3480976   4.03  0.000   .7218764   2.086394    maccat |   2.8423  .7008588   4.06  0.000   1.468642   4.215958   disp4cat |  .6347805  .2872258   2.21  0.027   .0718283   1.197733 -- Also tried changing method=approximate with no noticeable change On Jun 7, 2013, at 15:34 , Richard Beckett wrote: Sorry to once again write a message but I'm once again stumped and am having no luck finding a solution anywhere else. This question requires some finesse in both R and STATA so hopefully I will be able to get an answer here. I am much more adept in R and am trying to replicate the results of a STATA file in R. Hopefully this is a proper forum for such questions. This is the code for the clogit in STATA clogit sftpcons sftptv2a3 sftptv2a4 sftptv2a5 sftptv2a2 sftptv2a6 logim maccat disp4cat if sample==1 glb_ind==Y, group(stratida) and I tried to replicate it using clogit1-clogit(sftpcons~sftptv2a3+sftptv2a4+sftptv2a5+sftptv2a2+sftptv2a6+logim+maccat+disp4cat+strata(stratida), dframe, sample==1 | glb_ind==Y) but got different results What did I do wrong here? I interpreted the STATA clogit as run this logit as long as the sample is 1 and glb_ind=Y What should I be doing instead? An rather than | in the R version might help. Other than that, we're a bit short on clues unless you provide some output. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version
Re: [R] Distance-based non-parametric ANOVA
I shall use this from now on. Thank you. On Friday, June 07, 2013 19:19:55 Pascal Oettli wrote: Hello, You also can use the package sos findFn('non-parametric multivariate ANOVA') Regards,Pascal 2013/6/7 Mikhail Umorin mike...@gmail.com[1] Hello, I am comparing treatments by comparing within group to between group distances likedescribed in MJ Anderson. 2001. A new method for non-parametric multivariate analysis of variance.Austral Ecology 26: 32 -- 46. The idea is to find the ratio of within group sum of distance^2 to between group sum ofdistance^2. Then find the distribution of the statistic by permuting the sample. Has anyone done this before in R code? Are there any packages (I did checked onCRAN - none). Code? I want to make sure I won't be inventing a bicycle before I write my own package. Comments? Suggestions? (I have never written an R package before) Thank you for your time, Mikhail. [[alternative HTML version deleted]] __ R-help@r-project.org[2] mailing list https://stat.ethz.ch/mailman/listinfo/r-help[3] http://www.R-project.org/posting-guide.html[4] [1] mailto:mike...@gmail.com [2] mailto:R-help@r-project.org [3] https://stat.ethz.ch/mailman/listinfo/r-help [4] http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.nb error
Sorry Sarah. dput(dat) structure(list(gene = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L), .Label = c(gene1, gene10, gene2, gene3, gene4, gene5, gene6, gene7, gene8, gene9), class = factor), b1 = c(18L, 15L, 10L, 4L, 0L, 3L, 0L, 4L, 11L, 6L), b2 = c(15L, 8L, 9L, 0L, 1L, 3L, 4L, 4L, 6L, 3L), b3 = c(13L, 8L, 8L, 4L, 0L, 3L, 0L, 7L, 13L, 6L), b4 = c(13L, 7L, 12L, 3L, 0L, 3L, 2L, 3L, 10L, 3L), c1 = c(16L, 0L, 9L, 0L, 1L, 4L, 2L, 0L, 13L, 4L), c2 = c(9L, 12L, 11L, 5L, 5L, 4L, 2L, 6L, 7L, 7L), c3 = c(20L, 18L, 12L, 0L, 1L, 4L, 0L, 6L, 12L, 6L), c4 = c(24L, 4L, 12L, 0L, 0L, 4L, 0L, 12L, 9L, 3L)), .Names = c(gene, b1, b2, b3, b4, c1, c2, c3, c4), class = data.frame, row.names = c(NA, -10L)) above was the dput(dat). Thanks. Daofeng On Fri, Jun 7, 2013 at 10:47 AM, Sarah Goslee sarah.gos...@gmail.comwrote: Hi, On Fri, Jun 7, 2013 at 11:36 AM, Daofeng Li lid...@gmail.com wrote: Thank you Sarah and Marc for your fast and nice response. Apology for didn't include all information. I have a input file like following: gene1 18 15 13 13 16 9 20 24 gene2 15 8 8 7 0 12 18 4 gene3 10 9 8 12 9 11 12 12 gene4 4 0 4 3 0 5 0 0 gene5 0 1 0 0 1 5 1 0 gene6 3 3 3 3 4 4 4 4 gene7 0 4 0 2 2 2 0 0 gene8 4 4 7 3 0 6 6 12 gene9 11 6 13 10 13 7 12 9 gene10 6 3 6 3 4 7 6 3 dat = read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4)) Is this what's in the test file that your code reads in? We don't have that file, so can't run your code. If it is, then the output of dput(dat) would be enormously more useful than copy and pasting your data file into your email. And yes, the full code is very little like the pair of lines you originally pasted in. Sarah I am using following R code to compare the difference between the 1st 4 numbers against 2nd 4 numbers: library(MASS) library(coin) suppressPackageStartupMessages(suppressWarnings(library(tcltk))) library(qvalue) library(plyr) dat = read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4)) index=(apply(dat[,-1],1,sum)0) data = dat[index,] group=c(1,1,1,1,0,0,0,0) n=nrow(data) result=NULL for (i in 1:n){ print(i) y=as.numeric(data[i,-1]) if (all((y-mean(y))==0)) result=rbind(result,c(0,0,0,1)) else { fit=glm.nb(y~group) result=rbind(result,summary(fit)$coef[2,]) } } qval = qvalue(result[,4]) fdr=0.1 index=(qval$qvaluesfdr) dat.result = data[index,] write.table(dat.result,file=test_result,row.names=F,quote=F) If you use this input file and code, would reproduce the same error: Loading required package: methods Loading required package: survival Loading required package: splines Loading required package: mvtnorm Loading required package: modeltools Loading required package: stats4 Attaching package: plyr The following object is masked from package:modeltools: empty [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: Warning messages: 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached 2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached Execution halted So might be the error was in 6th line, not the line I pasted before (5th line)? Sorry about that. Thanks. Daofeng On Fri, Jun 7, 2013 at 10:15 AM, Marc Schwartz marc_schwa...@me.com wrote: On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote: Dear R Community, I have encountered a problem while using the R function glm.nb. The code that produce the error was following two lines: group=c(1,1,1,1,0,0,0,0) fit=glm.nb(y~group) While the y contains 8 sets of number like: gene2750 1 0 0 1 5 1 0 Error message: Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: There were 50 or more warnings (use warnings() to see the first 50) Execution halted Information of my system: sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Does anyone happen to have some hit on how to solve this? Appreciate for any response. Thanks in advance, Daofeng
Re: [R] glm.nb error
Thank you Sarah and Marc for your fast and nice response. Apology for didn't include all information. I have a input file like following: gene1 18 15 13 13 16 9 20 24 gene2 15 8 8 7 0 12 18 4 gene3 10 9 8 12 9 11 12 12 gene4 4 0 4 3 0 5 0 0 gene5 0 1 0 0 1 5 1 0 gene6 3 3 3 3 4 4 4 4 gene7 0 4 0 2 2 2 0 0 gene8 4 4 7 3 0 6 6 12 gene9 11 6 13 10 13 7 12 9 gene10 6 3 6 3 4 7 6 3 I am using following R code to compare the difference between the 1st 4 numbers against 2nd 4 numbers: library(MASS) library(coin) suppressPackageStartupMessages(suppressWarnings(library(tcltk))) library(qvalue) library(plyr) dat = read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4)) index=(apply(dat[,-1],1,sum)0) data = dat[index,] group=c(1,1,1,1,0,0,0,0) n=nrow(data) result=NULL for (i in 1:n){ print(i) y=as.numeric(data[i,-1]) if (all((y-mean(y))==0)) result=rbind(result,c(0,0,0,1)) else { fit=glm.nb(y~group) result=rbind(result,summary(fit)$coef[2,]) } } qval = qvalue(result[,4]) fdr=0.1 index=(qval$qvaluesfdr) dat.result = data[index,] write.table(dat.result,file=test_result,row.names=F,quote=F) If you use this input file and code, would reproduce the same error: Loading required package: methods Loading required package: survival Loading required package: splines Loading required package: mvtnorm Loading required package: modeltools Loading required package: stats4 Attaching package: plyr The following object is masked from package:modeltools: empty [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: Warning messages: 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached 2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached Execution halted So might be the error was in 6th line, not the line I pasted before (5th line)? Sorry about that. Thanks. Daofeng On Fri, Jun 7, 2013 at 10:15 AM, Marc Schwartz marc_schwa...@me.com wrote: On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote: Dear R Community, I have encountered a problem while using the R function glm.nb. The code that produce the error was following two lines: group=c(1,1,1,1,0,0,0,0) fit=glm.nb(y~group) While the y contains 8 sets of number like: gene2750 1 0 0 1 5 1 0 Error message: Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: There were 50 or more warnings (use warnings() to see the first 50) Execution halted Information of my system: sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Does anyone happen to have some hit on how to solve this? Appreciate for any response. Thanks in advance, Daofeng There is something wrong with your actual 'y' or 'group' that is not evident from the above info: group - c(1, 1, 1, 1, 0, 0, 0, 0) y - c(0, 1, 0, 0, 1, 5, 1, 0) require(MASS) Loading required package: MASS glm.nb(y ~ group) Call: glm.nb(formula = y ~ group, init.theta = 1.711564307, link = log) Coefficients: (Intercept)group 0.5596 -1.9459 Degrees of Freedom: 7 Total (i.e. Null); 6 Residual Null Deviance: 10.23 Residual Deviance: 6.848AIC: 25.25 Check str(y) and str(group) You should also be sure to note in your posts when you are using a function from a non-base package, in this case MASS, which is not indicated in your sessionInfo() above, so something is amiss there as well. Regards, Marc Schwartz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SQL queries in R
Hey all! I am trying to select a bunch of id's (data type -character) from a table and store them in a variable in R But when i do this, it automatically truncates the leading zero's in id's even though they are of character type. code is :- myconn-odbcConnect(testdata) sql.select-paste(select UNIT_ID from UNITS where (UNIT_TYPE=',unit,' and COMMUNITY=',property,'),sep=) unit_ids-sqlQuery(myconn,sql.select) print(unit_ids) is there anyway i can retain the UNIT_ID's as they are. Thanks! Sneha [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.nb error
Thanks Marc...still didn't get it... Tried this... str(group) num [1:8] 1 1 1 1 0 0 0 0 fit=glm.nb(y~group) Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed group - c('1','1','1','1','0','0','0','0') str(group) chr [1:8] 1 1 1 1 0 0 0 0 fit=glm.nb(y~group) Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Daofeng On Fri, Jun 7, 2013 at 10:48 AM, Marc Schwartz marc_schwa...@me.com wrote: On Jun 7, 2013, at 10:36 AM, Daofeng Li lid...@gmail.com wrote: Thank you Sarah and Marc for your fast and nice response. Apology for didn't include all information. I have a input file like following: gene1 18 15 13 13 16 9 20 24 gene2 15 8 8 7 0 12 18 4 gene3 10 9 8 12 9 11 12 12 gene4 4 0 4 3 0 5 0 0 gene5 0 1 0 0 1 5 1 0 gene6 3 3 3 3 4 4 4 4 gene7 0 4 0 2 2 2 0 0 gene8 4 4 7 3 0 6 6 12 gene9 11 6 13 10 13 7 12 9 gene106 3 6 3 4 7 6 3 I am using following R code to compare the difference between the 1st 4 numbers against 2nd 4 numbers: library(MASS) library(coin) suppressPackageStartupMessages(suppressWarnings(library(tcltk))) library(qvalue) library(plyr) dat = read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4)) index=(apply(dat[,-1],1,sum)0) data = dat[index,] group=c(1,1,1,1,0,0,0,0) n=nrow(data) result=NULL for (i in 1:n){ print(i) y=as.numeric(data[i,-1]) if (all((y-mean(y))==0)) result=rbind(result,c(0,0,0,1)) else { fit=glm.nb(y~group) result=rbind(result,summary(fit)$coef[2,]) } } qval = qvalue(result[,4]) fdr=0.1 index=(qval$qvaluesfdr) dat.result = data[index,] write.table(dat.result,file=test_result,row.names=F,quote=F) If you use this input file and code, would reproduce the same error: Loading required package: methods Loading required package: survival Loading required package: splines Loading required package: mvtnorm Loading required package: modeltools Loading required package: stats4 Attaching package: plyr The following object is masked from package:modeltools: empty [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: Warning messages: 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached 2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached Execution halted So might be the error was in 6th line, not the line I pasted before (5th line)? Sorry about that. Thanks. Daofeng Your 'y' at that point in the loop is: y [1] 3 3 3 3 4 4 4 4 and 'group' is: group [1] 1 1 1 1 0 0 0 0 glm.nb(y ~ group) Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Think about it... :-) Regards, Marc Schwartz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Distance-based non-parametric ANOVA
Thank you for the reference. Very helpful Mikhail On Friday, June 07, 2013 12:06:22 R Heberto Ghezzo, Dr wrote: You better see the original book Permutation methods - Mielke Jr Paul J, Berry Kenneth J. The programs in Fortran for the tests are in www.stat.colostate.edu/permute R.Heberto Ghezzo Ph.D. Montreal - Canada From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of Mikhail Umorin [mike...@gmail.com] Sent: Thursday, June 06, 2013 4:34 PM To: r-help@r-project.org Subject: [R] Distance-based non-parametric ANOVA Hello, I am comparing treatments by comparing within group to between group distances like described in MJ Anderson. 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecology 26: 32 -- 46. The idea is to find the ratio of within group sum of distance^2 to between group sum of distance^2. Then find the distribution of the statistic by permuting the sample. Has anyone done this before in R code? Are there any packages (I did checked on CRAN - none). Code? I want to make sure I won't be inventing a bicycle before I write my own package. Comments? Suggestions? (I have never written an R package before) Thank you for your time, Mikhail. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm.nb error
Thanks for all your help. I will try to fix this. Very helpful. Best, Daofeng On Fri, Jun 7, 2013 at 11:25 AM, Sarah Goslee sarah.gos...@gmail.comwrote: As Marc already pointed out, take a close look at this part of your loop: R i - 6 R R y - as.numeric(data[i,-1]) R y [1] 3 3 3 3 4 4 4 4 R group [1] 1 1 1 1 0 0 0 0 R fit - glm.nb(y~group) Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed What do you expect to happen there? In general, it's a better practice to pre-specify the size of result (eg matrix(NA, nrow=n, ncol=4) ) and fill it as you go, rather than using rbind() within a loop. Much more memory-efficient. Sarah On Fri, Jun 7, 2013 at 11:58 AM, Daofeng Li lid...@gmail.com wrote: Sorry Sarah. dput(dat) structure(list(gene = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L), .Label = c(gene1, gene10, gene2, gene3, gene4, gene5, gene6, gene7, gene8, gene9), class = factor), b1 = c(18L, 15L, 10L, 4L, 0L, 3L, 0L, 4L, 11L, 6L), b2 = c(15L, 8L, 9L, 0L, 1L, 3L, 4L, 4L, 6L, 3L), b3 = c(13L, 8L, 8L, 4L, 0L, 3L, 0L, 7L, 13L, 6L), b4 = c(13L, 7L, 12L, 3L, 0L, 3L, 2L, 3L, 10L, 3L), c1 = c(16L, 0L, 9L, 0L, 1L, 4L, 2L, 0L, 13L, 4L), c2 = c(9L, 12L, 11L, 5L, 5L, 4L, 2L, 6L, 7L, 7L), c3 = c(20L, 18L, 12L, 0L, 1L, 4L, 0L, 6L, 12L, 6L), c4 = c(24L, 4L, 12L, 0L, 0L, 4L, 0L, 12L, 9L, 3L)), .Names = c(gene, b1, b2, b3, b4, c1, c2, c3, c4), class = data.frame, row.names = c(NA, -10L)) above was the dput(dat). Thanks. Daofeng On Fri, Jun 7, 2013 at 10:47 AM, Sarah Goslee sarah.gos...@gmail.com wrote: Hi, On Fri, Jun 7, 2013 at 11:36 AM, Daofeng Li lid...@gmail.com wrote: Thank you Sarah and Marc for your fast and nice response. Apology for didn't include all information. I have a input file like following: gene1 18 15 13 13 16 9 20 24 gene2 15 8 8 7 0 12 18 4 gene3 10 9 8 12 9 11 12 12 gene4 4 0 4 3 0 5 0 0 gene5 0 1 0 0 1 5 1 0 gene6 3 3 3 3 4 4 4 4 gene7 0 4 0 2 2 2 0 0 gene8 4 4 7 3 0 6 6 12 gene9 11 6 13 10 13 7 12 9 gene10 6 3 6 3 4 7 6 3 dat = read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4)) Is this what's in the test file that your code reads in? We don't have that file, so can't run your code. If it is, then the output of dput(dat) would be enormously more useful than copy and pasting your data file into your email. And yes, the full code is very little like the pair of lines you originally pasted in. Sarah I am using following R code to compare the difference between the 1st 4 numbers against 2nd 4 numbers: library(MASS) library(coin) suppressPackageStartupMessages(suppressWarnings(library(tcltk))) library(qvalue) library(plyr) dat = read.table(test,col.names=c(gene,b1,b2,b3,b4,c1,c2,c3,c4)) index=(apply(dat[,-1],1,sum)0) data = dat[index,] group=c(1,1,1,1,0,0,0,0) n=nrow(data) result=NULL for (i in 1:n){ print(i) y=as.numeric(data[i,-1]) if (all((y-mean(y))==0)) result=rbind(result,c(0,0,0,1)) else { fit=glm.nb(y~group) result=rbind(result,summary(fit)$coef[2,]) } } qval = qvalue(result[,4]) fdr=0.1 index=(qval$qvaluesfdr) dat.result = data[index,] write.table(dat.result,file=test_result,row.names=F,quote=F) If you use this input file and code, would reproduce the same error: Loading required package: methods Loading required package: survival Loading required package: splines Loading required package: mvtnorm Loading required package: modeltools Loading required package: stats4 Attaching package: plyr The following object is masked from package:modeltools: empty [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 Error in while ((it - it + 1) limit abs(del) eps) { : missing value where TRUE/FALSE needed Calls: glm.nb - as.vector - theta.ml In addition: Warning messages: 1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached 2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace = control$trace : iteration limit reached Execution halted So might be the error was in 6th line, not the line I pasted before (5th line)? Sorry about that. Thanks. Daofeng On Fri, Jun 7, 2013 at 10:15 AM, Marc Schwartz marc_schwa...@me.com wrote: On Jun 7, 2013, at 9:44 AM, Daofeng Li lid...@gmail.com wrote: Dear R Community, I have encountered a problem while using the R function glm.nb. The code that produce the error was following two lines: group=c(1,1,1,1,0,0,0,0) fit=glm.nb(y~group) While the y contains 8 sets of number like: gene2750 1 0 0 1 5 1 0 Error message:
Re: [R] SQL queries in R
?sqlQuery as.is - argument Andrija On Jun 7, 2013 9:10 PM, Sneha Bishnoi sneha.bish...@gmail.com wrote: Hey all! I am trying to select a bunch of id's (data type -character) from a table and store them in a variable in R But when i do this, it automatically truncates the leading zero's in id's even though they are of character type. code is :- myconn-odbcConnect(testdata) sql.select-paste(select UNIT_ID from UNITS where (UNIT_TYPE=',unit,' and COMMUNITY=',property,'),sep=) unit_ids-sqlQuery(myconn,sql.select) print(unit_ids) is there anyway i can retain the UNIT_ID's as they are. Thanks! Sneha [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Package 'ChoiceModelR' - Xbetas.csv? Rbetas.csv? betadraws?
Hope to get some clarity from the creators of a (wonderful) package ChoiceModelR. I am using the main function 'choicemodelr'. Everything works. ?choicemodelr says average of MCMC draws of unit-level model coefficients are written to Xbetas.csv Question 1: Is this a typo? The real file created in the working directory seems to be Rbetas.csv Questions 2: Are these the averages of the 'use' draws (specified in my choicemodelr statement)? Question 3: If I had constraints, I assume these averages will be taking into account constraints, right? ?choicemodelr also says that choicemodelr produces an object that contains, among other things, two elemens: betadraw An ni by natt by floor(use/keep) array of MCMC random draws of unit-level multinomial logit model parameter estimates. betadraw.c An ni by natt by floor(use/keep) array of constrained MCMC random draws of unit-level multinomial logit model parameter estimates. If I do: apply(out$betadraw.c,c(1,2),mean) then what is the difference between the result and the file written to Disk? One difference I noticed: apply(out$betadraw.c, c(1,2),mean) contains as many columns as parameters, not as many columns as the number of levels. What else? Thank you very much! -- Dimitri Liakhovitski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Horizontal labels on a plot
On Jun 7, 2013, at 8:38 AM, Shane Carey wrote: Hi, I have a plot with horizontal lables on the x-axis. For example: Devonian volcanic rocks n=2 with n=2 on a new line. How do I centre n=2 under the Devonian volcanic rocks label? This was my code: text(axis_text, par(usr)[three], labels = paste(LABELS, \n , n =, t(t(name.count[,two]))), srt = txt_di, adj = txtadj, xpd = TRUE, cex=txt_cex) Try: …, labels=bquote( atop(.(LABELS) , n==.(name.count[,two]) ) ), … (Unable to test with all those undefined parameters.) ?plotmath # where one can read that \n is not honored by plotmath functions, including paste(). ?bquote -- David where LABELS=Devonian volcanic rocks and name.count[,two] =2 David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rJava is not loading
Thanks again, Brian. Indeed, I had Java for 32 bits. Replaced it and rJava is working now. On Thu, Jun 6, 2013 at 10:58 AM, Prof Brian Ripley rip...@stats.ox.ac.ukwrote: On 06/06/2013 15:52, Dimitri Liakhovitski wrote: Thank you very much, Brian. It's clearly christal clear - but one needs a christal ball to realize that! :-) So I learned: architecture = bitness On current Windows, yes. But not on OS X nor Linux nor Solaris nor FreeBSD It indicates the type of CPU, or more precisely the instruction set the CPU is running. x86_64 CPUs (as used by most current Windows boxes) can emulate i386 CPUs. So the architectures for R on Windows are i386 (32-bit) x64 (64-bit, the Windows name for x86_64). Dimitri On Thu, Jun 6, 2013 at 2:19 AM, Prof Brian Ripley rip...@stats.ox.ac.uk mailto:rip...@stats.ox.ac.uk** wrote: On 06/06/2013 00:38, Dimitri Liakhovitski wrote: Hello! I installed rJava and am trying to load it. library(rJava) Error : .onLoad failed in loadNamespace() for 'rJava', details: call: fun(libname, pkgname) error: No CurrentVersion entry in Software/JavaSoft registry! Try re-installing Java and make sure R and Java have matching architectures. Error: package or namespace load failed for rJava Any idea why? Background info: But the posting guide asked for the output from sessionInfo(): we want to know what R knows it is running as, not why you think. Windows 7, 64-bit R version 3.0.1 (for 64-bit) I just installed the lastest Java: Java 7 Update 21 For the right architecture? The message is crystal clear: you do not have a Java installed matching your R. Most likely your Java is 32-bit and your R 64-bit or v.v. -- Brian D. Ripley, rip...@stats.ox.ac.uk mailto:rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~__**ripley/http://www.stats.ox.ac.uk/~__ripley/ http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 tel:%2B44%201865%20272861 (self) 1 South Parks Road, +44 1865 272866 tel:%2B44%201865%20272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 tel:%2B44%201865%20272595 -- Dimitri Liakhovitski -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~**ripley/http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Dimitri Liakhovitski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQL queries in R
tried as.is ,gives an error, [1] 01000 10054 [Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionWrite (send()). [2] [RODBC] ERROR: Could not SQLExecDirect 'select UNIT_ID from UNITS where (UNIT_TYPE='1X1' and COMMUNITY='SAN1193')' On Fri, Jun 7, 2013 at 3:21 PM, andrija djurovic djandr...@gmail.comwrote: ?sqlQuery as.is - argument Andrija On Jun 7, 2013 9:10 PM, Sneha Bishnoi sneha.bish...@gmail.com wrote: Hey all! I am trying to select a bunch of id's (data type -character) from a table and store them in a variable in R But when i do this, it automatically truncates the leading zero's in id's even though they are of character type. code is :- myconn-odbcConnect(testdata) sql.select-paste(select UNIT_ID from UNITS where (UNIT_TYPE=',unit,' and COMMUNITY=',property,'),sep=) unit_ids-sqlQuery(myconn,sql.select) print(unit_ids) is there anyway i can retain the UNIT_ID's as they are. Thanks! Sneha [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sneha Bishnoi +14047235469 H. Milton Stewart School of Industrial Systems Engineering Georgia Tech [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQL queries in R
myconn-odbcConnect(testdata) sql.select-paste(select UNIT_ID from UNITS where (UNIT_TYPE=',unit,' and COMMUNITY=',property,'),sep=) unit_ids-sqlQuery(myconn,sql.select,as.is=TRUE) This should works if myconn and sql.select are defined properly Andrija On Jun 7, 2013 9:58 PM, Sneha Bishnoi sneha.bish...@gmail.com wrote: tried as.is ,gives an error, [1] 01000 10054 [Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionWrite (send()). [2] [RODBC] ERROR: Could not SQLExecDirect 'select UNIT_ID from UNITS where (UNIT_TYPE='1X1' and COMMUNITY='SAN1193')' On Fri, Jun 7, 2013 at 3:21 PM, andrija djurovic djandr...@gmail.comwrote: ?sqlQuery as.is - argument Andrija On Jun 7, 2013 9:10 PM, Sneha Bishnoi sneha.bish...@gmail.com wrote: Hey all! I am trying to select a bunch of id's (data type -character) from a table and store them in a variable in R But when i do this, it automatically truncates the leading zero's in id's even though they are of character type. code is :- myconn-odbcConnect(testdata) sql.select-paste(select UNIT_ID from UNITS where (UNIT_TYPE=',unit,' and COMMUNITY=',property,'),sep=) unit_ids-sqlQuery(myconn,sql.select) print(unit_ids) is there anyway i can retain the UNIT_ID's as they are. Thanks! Sneha [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Sneha Bishnoi +14047235469 H. Milton Stewart School of Industrial Systems Engineering Georgia Tech [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RCOde need urgent HELP
hey i need help with a case in my R project. i have a vector Rt with returns from day 1 till day t now i want to form a new vector r= which is rt = Xt − Xt−1 then i want to definde positive and negative aggregates r(n)+ _t= 1/n (rt +...+rt−n) I {(rt+...+ rt−n) ≥0} r(n)− _t= 1/n (rt +...+rt−n) I {(rt+...+ rt−n) 0} thanks for the help -- View this message in context: http://r.789695.n4.nabble.com/RCOde-need-urgent-HELP-tp4668981.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gamm in mgcv random effect significance
Dear R-helpers, I'd like to understand how to test the statistical significance of a random effect in gamm. I am using gamm because I want to test a model with an AR(1) error structure, and it is my understanding neither gam nor gamm4 will do the latter. The data set includes nine short interrupted time series (single case designs in education, sometimes called N-of-1 trials in medicine) from one study. They report a proportion as outcome (y), computed from a behavior either observed or not out of 10 trials per time point. Hence I use binomial (I believe quasi-binomial is not available in gamm). Each of the nine series has an average of 30 observations give or take (total 264 observations), some under treatment (z) and some not. xc is centered session number, int is the z*xc interaction. Based on prior work, xc is also smoothed Consider, for example, two models, both with AR(1) but one allowing a random effect on xc: g1 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial, correlation=corAR1()) g2 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial, random = list(xc=~1),correlation=corAR1()) I include the output for g1 and g2 below, but the question is how to test the significance of the random effect on xc. I considered a test comparing the Log-Likelihoods, but have no idea what the degrees of freedom would be given that s(xc) is smoothed. I also tried: anova(g1$gam, g2$gam) that did not seem to return anything useful for this question. A related question is how to test the significance of adding a second random effect to a model that already has a random effect, such as: g3 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random = list(Case=~1, z=~1),correlation=corAR1()) g4 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random = list(Case=~1, z=~1, int=~1),correlation=corAR1()) Any help would be appreciated. Thanks. Will Shadish g1 $lme Linear mixed-effects model fit by maximum likelihood Data: data Log-likelihood: -437.696 Fixed: fixed X(Intercept) Xz XintXs(xc)Fx1 0.6738466 -2.56883170.0137415 -0.1801294 Random effects: Formula: ~Xr - 1 | g Structure: pdIdnot Xr1 Xr2 Xr3 Xr4 Xr5 Xr6 Xr7 Xr8 Residual StdDev: 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781 1.693177 Correlation Structure: AR(1) Formula: ~1 | g Parameter estimate(s): Phi 0.3110725 Variance function: Structure: fixed weights Formula: ~invwt Number of Observations: 264 Number of Groups: 1 $gam Family: binomial Link function: logit Formula: y ~ s(xc) + z + int Estimated degrees of freedom: 1 total = 4 attr(,class) [1] gamm list g2 $lme Linear mixed-effects model fit by maximum likelihood Data: data Log-likelihood: -443.9495 Fixed: fixed X(Intercept) Xz XintXs(xc)Fx1 0.720018143 -2.562155820 0.003457463 -0.045821030 Random effects: Formula: ~Xr - 1 | g Structure: pdIdnot Xr1 Xr2 Xr3 Xr4 Xr5 Xr6 Xr7 Xr8 StdDev: 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 Formula: ~1 | xc %in% g (Intercept) Residual StdDev: 6.277279e-05 1.683007 Correlation Structure: AR(1) Formula: ~1 | g/xc Parameter estimate(s): Phi 0.1809409 Variance function: Structure: fixed weights Formula: ~invwt Number of Observations: 264 Number of Groups: g xc %in% g 134 $gam Family: binomial Link function: logit Formula: y ~ s(xc) + z + int Estimated degrees of freedom: 1 total = 4 attr(,class) [1] gamm list -- William R. Shadish Distinguished Professor Founding Faculty Mailing Address: William R. Shadish University of California School of Social Sciences, Humanities and Arts 5200 North Lake Rd Merced CA 95343 Physical/Delivery Address: University of California Merced ATTN: William Shadish School of Social Sciences, Humanities and Arts Facilities Services Building A 5200 North Lake Rd. Merced, CA 95343 209-228-4372 voice 209-228-4007 fax (communal fax: be sure to include cover sheet) wshad...@ucmerced.edu http://faculty.ucmerced.edu/wshadish/index.htm http://psychology.ucmerced.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matched samples, dataframe, panel data
Thank you very much. I apologize but I want to ask only one more thing: How to do if I want in addition to impose that the diffence between dimensions doesn't exced an absolute value, for example, 200. Firm 1 continues matching with firm 5 in year 2000 because 2189-2120=69, and 69 is 10%x2120=212, and 69 is also 200 But firm 20 doesn't match with firm 17 in year 2001 because 6754-6438=316, and 316 is 10%x6438=643.8, but 316 is 200 Thank you Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 17:30 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data Sorry, it included a case with both values for dummy were 1. lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]}) res-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) nrow(subset(res,dummy==0)) #[1] 22 nrow(subset(res,dummy==1)) #[1] 22 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 12:03 PM Subject: RE: [R] matched samples, dataframe, panel data Sorry, Something is not ok, because when I do nrow(subset(res,res$dummy==0)) [1] 22 nrow(subset(res,res$dummy==1)) [1] 24 There the number o observatios in the two subsets is not equal. Thanks again, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:43 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data Hi, Not sure if this is what you wanted. res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) # this combines the list of lists to a data.frame res[1:4,] # firm year industry dummy dimension #11 2000 20 0 2120 #25 2000 20 1 2189 #34 2000 20 0 3178 #47 2000 20 1 3245 #or res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res)) row.names(res)-1:nrow(res) res[1:4,] # firm year industry dummy dimension group #11 2000 20 0 2120 2000.20 #1 group #25 2000 20 1 2189 2000.20 #1 #34 2000 20 0 3178 2000.20 #2 #47 2000 20 1 3245 2000.20 #2 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: Sent: Friday, June 7, 2013 11:33 AM Subject: RE: [R] matched samples, dataframe, panel data Thank you very much. Just a little thing: how can I put it like a dataframe? Thanks, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:27 Para: Cecilia Carmo Assunto: Re: [R] matched samples, dataframe, panel data Hi, There could be easier ways... I am a bit busy now to try other ways. - Original Message - From: arun smartpink...@yahoo.com To: Cecilia Carmo cecilia.ca...@ua.pt Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 11:25 AM Subject: Re: [R] matched samples, dataframe, panel data Hi, May be this helps: lst1-split(final3,list(final3$year,final3$industry)) lst2-lst1[lapply(lst1,nrow)0] lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y (x$dimension+x$dimension*0.1)) (y (x$dimension-x$dimension*0.1)),])) lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2]) lst5-lapply(lst4,function(x)x[!duplicated(x)]) lst6-lst5[lapply(lst5,length)0] names(lst6) # [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30 #[8] 2001.40 2002.40 2003.40 2004.40 lst6[2000.20] #$`2000.20` #$`2000.20`[[1]] # firm year industry dummy dimension #1 1 2000 20 0 2120 #215 2000 20 1 2189 # #$`2000.20`[[2]] # firm year industry dummy dimension #164 2000 20 0 3178 #317 2000 20 1 3245 # #$`2000.20`[[3]] # firm year industry dummy dimension #113 2000 20 1 4532 #6 2 2000 20 0 4890 A.K. From: Cecilia Carmo cecilia.ca...@ua.pt To: r-help@r-project.org r-help@r-project.org Sent: Friday, June 7, 2013 9:56 AM Subject: Re: [R] matched samples, dataframe, panel data Again my problem, better explained. #I have a data panel of thousands of firms, by year and industry and #one dummy variable that identifies one kind of firms (1 if the firm have an auditor; 0 if not) #and another variable the represents the firm dimension (total assets in thousand of euros) #I need to create two separated samples with the same number os firms where #one firm in the first have a corresponding firm in the second with the same #year, industry and dimension (the dimension doesn't need to be exatly the #same, it could vary in an interval of +/- 10%, for example) #My
Re: [R] gamm in mgcv random effect significance
On Fri, 2013-06-07 at 13:12 -0700, William Shadish wrote: Dear R-helpers, I'd like to understand how to test the statistical significance of a random effect in gamm. I am using gamm because I want to test a model with an AR(1) error structure, and it is my understanding neither gam nor gamm4 will do the latter. gamm4() can't yes and out of the box mgcv::gam can't either but see ?magic for an example of correlated errors and how the fits can be manipulated to take the AR(1) (or any structure really as far as I can tell) into account. You might like to look at mgcv::bam() which allows an known AR(1) term but do check that it does what you think; with a random effect spline I'm not at all certain that it will nest the AR(1) in the random effect level. snip / Consider, for example, two models, both with AR(1) but one allowing a random effect on xc: g1 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial, correlation=corAR1()) g2 - gamm(y ~ s(xc) +z+ int,family=binomial, weights=trial, random = list(xc=~1),correlation=corAR1()) Shouldn't you specify how the AR(1) is nested in the hierarchy here, i.e. AR(1) within xc? maybe I'm not following your data structure correctly. I include the output for g1 and g2 below, but the question is how to test the significance of the random effect on xc. I considered a test comparing the Log-Likelihoods, but have no idea what the degrees of freedom would be given that s(xc) is smoothed. I also tried: anova(g1$gam, g2$gam) gamm() fits via the lme() function of package nlme. To do what you want, you need the anova() method for objects of class lme, e.g. anova(g1$lme, g2$lme) Then I think you should check if the fits were done via REML and also be aware of the issue of testing wether a variance term is 0. that did not seem to return anything useful for this question. A related question is how to test the significance of adding a second random effect to a model that already has a random effect, such as: g3 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random = list(Case=~1, z=~1),correlation=corAR1()) g4 - gamm(y ~ xc +z+ s(int),family=binomial, weights=trial, random = list(Case=~1, z=~1, int=~1),correlation=corAR1()) Again, I think you need anova() on the $lme components. HTH G Any help would be appreciated. Thanks. Will Shadish g1 $lme Linear mixed-effects model fit by maximum likelihood Data: data Log-likelihood: -437.696 Fixed: fixed X(Intercept) Xz XintXs(xc)Fx1 0.6738466 -2.56883170.0137415 -0.1801294 Random effects: Formula: ~Xr - 1 | g Structure: pdIdnot Xr1 Xr2 Xr3 Xr4 Xr5 Xr6 Xr7 Xr8 Residual StdDev: 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781 0.0004377781 1.693177 Correlation Structure: AR(1) Formula: ~1 | g Parameter estimate(s): Phi 0.3110725 Variance function: Structure: fixed weights Formula: ~invwt Number of Observations: 264 Number of Groups: 1 $gam Family: binomial Link function: logit Formula: y ~ s(xc) + z + int Estimated degrees of freedom: 1 total = 4 attr(,class) [1] gamm list g2 $lme Linear mixed-effects model fit by maximum likelihood Data: data Log-likelihood: -443.9495 Fixed: fixed X(Intercept) Xz XintXs(xc)Fx1 0.720018143 -2.562155820 0.003457463 -0.045821030 Random effects: Formula: ~Xr - 1 | g Structure: pdIdnot Xr1 Xr2 Xr3 Xr4 Xr5 Xr6 Xr7 Xr8 StdDev: 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 7.056078e-06 Formula: ~1 | xc %in% g (Intercept) Residual StdDev: 6.277279e-05 1.683007 Correlation Structure: AR(1) Formula: ~1 | g/xc Parameter estimate(s): Phi 0.1809409 Variance function: Structure: fixed weights Formula: ~invwt Number of Observations: 264 Number of Groups: g xc %in% g 134 $gam Family: binomial Link function: logit Formula: y ~ s(xc) + z + int Estimated degrees of freedom: 1 total = 4 attr(,class) [1] gamm list -- Gavin Simpson, PhD [t] +1 306 337 8863 Adjunct Professor, Department of Biology[f] +1 306 337 2410 Institute of Environmental Change Society [e] gavin.simp...@uregina.ca 523 Research and Innovation Centre [tw] @ucfagls University of Regina Regina, SK S4S 0A2, Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matched samples, dataframe, panel data
HI, Do you want to impose this on 'res' on 'final3'? If it is on `res`: res1-do.call(rbind,lapply(split(res,(1:nrow(res)-1)%/%2+1),function(x) x[abs(diff(x$dimension))200,])) row.names(res1)-1:nrow(res1) nrow(subset(res1,dummy==0)) #[1] 14 nrow(subset(res1,dummy==1)) #[1] 14 head(res1) # firm year industry dummy dimension #1 1 2000 20 0 2120 #2 5 2000 20 1 2189 #3 4 2000 20 0 3178 #4 7 2000 20 1 3245 #5 4 2001 20 0 2678 #6 2 2001 20 1 2789 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: R help r-help@r-project.org Cc: arun smartpink...@yahoo.com Sent: Friday, June 7, 2013 5:51 PM Subject: RE: [R] matched samples, dataframe, panel data Thank you very much. I apologize but I want to ask only one more thing: How to do if I want in addition to impose that the diffence between dimensions doesn't exced an absolute value, for example, 200. Firm 1 continues matching with firm 5 in year 2000 because 2189-2120=69, and 69 is 10%x2120=212, and 69 is also 200 But firm 20 doesn't match with firm 17 in year 2001 because 6754-6438=316, and 316 is 10%x6438=643.8, but 316 is 200 Thank you Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 17:30 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data Sorry, it included a case with both values for dummy were 1. lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]}) res-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) nrow(subset(res,dummy==0)) #[1] 22 nrow(subset(res,dummy==1)) #[1] 22 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 12:03 PM Subject: RE: [R] matched samples, dataframe, panel data Sorry, Something is not ok, because when I do nrow(subset(res,res$dummy==0)) [1] 22 nrow(subset(res,res$dummy==1)) [1] 24 There the number o observatios in the two subsets is not equal. Thanks again, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:43 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data Hi, Not sure if this is what you wanted. res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) # this combines the list of lists to a data.frame res[1:4,] # firm year industry dummy dimension #1 1 2000 20 0 2120 #2 5 2000 20 1 2189 #3 4 2000 20 0 3178 #4 7 2000 20 1 3245 #or res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res)) row.names(res)-1:nrow(res) res[1:4,] # firm year industry dummy dimension group #1 1 2000 20 0 2120 2000.20 #1 group #2 5 2000 20 1 2189 2000.20 #1 #3 4 2000 20 0 3178 2000.20 #2 #4 7 2000 20 1 3245 2000.20 #2 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: Sent: Friday, June 7, 2013 11:33 AM Subject: RE: [R] matched samples, dataframe, panel data Thank you very much. Just a little thing: how can I put it like a dataframe? Thanks, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:27 Para: Cecilia Carmo Assunto: Re: [R] matched samples, dataframe, panel data Hi, There could be easier ways... I am a bit busy now to try other ways. - Original Message - From: arun smartpink...@yahoo.com To: Cecilia Carmo cecilia.ca...@ua.pt Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 11:25 AM Subject: Re: [R] matched samples, dataframe, panel data Hi, May be this helps: lst1-split(final3,list(final3$year,final3$industry)) lst2-lst1[lapply(lst1,nrow)0] lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y (x$dimension+x$dimension*0.1)) (y (x$dimension-x$dimension*0.1)),])) lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2]) lst5-lapply(lst4,function(x)x[!duplicated(x)]) lst6-lst5[lapply(lst5,length)0] names(lst6) # [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30 #[8] 2001.40 2002.40 2003.40 2004.40 lst6[2000.20] #$`2000.20` #$`2000.20`[[1]] # firm year industry dummy dimension #1 1 2000 20 0 2120 #21 5 2000 20 1 2189 # #$`2000.20`[[2]] # firm year industry dummy dimension #16 4 2000 20 0 3178 #31 7 2000 20 1 3245 # #$`2000.20`[[3]] # firm year industry dummy dimension #11 3 2000 20 1 4532 #6 2 2000 20 0 4890 A.K.
Re: [R] matched samples, dataframe, panel data
Maybe the final result is the same, but I want to impose it on final3, when we are matching the firms. Thanks, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 23:04 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data HI, Do you want to impose this on 'res' on 'final3'? If it is on `res`: res1-do.call(rbind,lapply(split(res,(1:nrow(res)-1)%/%2+1),function(x) x[abs(diff(x$dimension))200,])) row.names(res1)-1:nrow(res1) nrow(subset(res1,dummy==0)) #[1] 14 nrow(subset(res1,dummy==1)) #[1] 14 head(res1) # firm year industry dummy dimension #11 2000 20 0 2120 #25 2000 20 1 2189 #34 2000 20 0 3178 #47 2000 20 1 3245 #54 2001 20 0 2678 #62 2001 20 1 2789 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: R help r-help@r-project.org Cc: arun smartpink...@yahoo.com Sent: Friday, June 7, 2013 5:51 PM Subject: RE: [R] matched samples, dataframe, panel data Thank you very much. I apologize but I want to ask only one more thing: How to do if I want in addition to impose that the diffence between dimensions doesn't exced an absolute value, for example, 200. Firm 1 continues matching with firm 5 in year 2000 because 2189-2120=69, and 69 is 10%x2120=212, and 69 is also 200 But firm 20 doesn't match with firm 17 in year 2001 because 6754-6438=316, and 316 is 10%x6438=643.8, but 316 is 200 Thank you Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 17:30 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data Sorry, it included a case with both values for dummy were 1. lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]}) res-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) nrow(subset(res,dummy==0)) #[1] 22 nrow(subset(res,dummy==1)) #[1] 22 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 12:03 PM Subject: RE: [R] matched samples, dataframe, panel data Sorry, Something is not ok, because when I do nrow(subset(res,res$dummy==0)) [1] 22 nrow(subset(res,res$dummy==1)) [1] 24 There the number o observatios in the two subsets is not equal. Thanks again, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:43 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data Hi, Not sure if this is what you wanted. res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) # this combines the list of lists to a data.frame res[1:4,] # firm year industry dummy dimension #11 2000 20 0 2120 #25 2000 20 1 2189 #34 2000 20 0 3178 #47 2000 20 1 3245 #or res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res)) row.names(res)-1:nrow(res) res[1:4,] # firm year industry dummy dimension group #11 2000 20 0 2120 2000.20 #1 group #25 2000 20 1 2189 2000.20 #1 #34 2000 20 0 3178 2000.20 #2 #47 2000 20 1 3245 2000.20 #2 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: Sent: Friday, June 7, 2013 11:33 AM Subject: RE: [R] matched samples, dataframe, panel data Thank you very much. Just a little thing: how can I put it like a dataframe? Thanks, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:27 Para: Cecilia Carmo Assunto: Re: [R] matched samples, dataframe, panel data Hi, There could be easier ways... I am a bit busy now to try other ways. - Original Message - From: arun smartpink...@yahoo.com To: Cecilia Carmo cecilia.ca...@ua.pt Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 11:25 AM Subject: Re: [R] matched samples, dataframe, panel data Hi, May be this helps: lst1-split(final3,list(final3$year,final3$industry)) lst2-lst1[lapply(lst1,nrow)0] lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) x[(y (x$dimension+x$dimension*0.1)) (y (x$dimension-x$dimension*0.1)),])) lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2]) lst5-lapply(lst4,function(x)x[!duplicated(x)]) lst6-lst5[lapply(lst5,length)0] names(lst6) # [1] 2000.20 2001.20 2002.20 2003.20 2004.20 2001.30 2002.30 #[8] 2001.40 2002.40 2003.40 2004.40 lst6[2000.20] #$`2000.20` #$`2000.20`[[1]] # firm year industry dummy dimension #1 1 2000 20 0 2120 #215
Re: [R] matched samples, dataframe, panel data
Hi, If i replace `lst3` with: lst1-split(final3,list(final3$year,final3$industry)) lst2-lst1[lapply(lst1,nrow)0] lst3-lapply(lst2,function(x) lapply(x$dimension,function(y) {x1-x[(y (x$dimension+x$dimension*0.1)) (y (x$dimension-x$dimension*0.1)),]; x1[abs(diff(x1$dimension))200,]})) ###changed lst4-lapply(lst3,function(x) x[lapply(x,nrow)==2]) lst5-lapply(lst4,function(x)x[!duplicated(x)]) lst6-lst5[lapply(lst5,length)0] lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]}) res2-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x))) identical(res1,res2) #[1] TRUE A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 6:10 PM Subject: RE: [R] matched samples, dataframe, panel data Maybe the final result is the same, but I want to impose it on final3, when we are matching the firms. Thanks, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 23:04 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data HI, Do you want to impose this on 'res' on 'final3'? If it is on `res`: res1-do.call(rbind,lapply(split(res,(1:nrow(res)-1)%/%2+1),function(x) x[abs(diff(x$dimension))200,])) row.names(res1)-1:nrow(res1) nrow(subset(res1,dummy==0)) #[1] 14 nrow(subset(res1,dummy==1)) #[1] 14 head(res1) # firm year industry dummy dimension #1 1 2000 20 0 2120 #2 5 2000 20 1 2189 #3 4 2000 20 0 3178 #4 7 2000 20 1 3245 #5 4 2001 20 0 2678 #6 2 2001 20 1 2789 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: R help r-help@r-project.org Cc: arun smartpink...@yahoo.com Sent: Friday, June 7, 2013 5:51 PM Subject: RE: [R] matched samples, dataframe, panel data Thank you very much. I apologize but I want to ask only one more thing: How to do if I want in addition to impose that the diffence between dimensions doesn't exced an absolute value, for example, 200. Firm 1 continues matching with firm 5 in year 2000 because 2189-2120=69, and 69 is 10%x2120=212, and 69 is also 200 But firm 20 doesn't match with firm 17 in year 2001 because 6754-6438=316, and 316 is 10%x6438=643.8, but 316 is 200 Thank you Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 17:30 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data Sorry, it included a case with both values for dummy were 1. lst7-lapply(lst6,function(x) {lst-lapply(x,function(y) y[sum(y$dummy)==1,]);lst[lapply(lst,nrow)0]}) res-do.call(rbind,lapply(lst7,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) nrow(subset(res,dummy==0)) #[1] 22 nrow(subset(res,dummy==1)) #[1] 22 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 12:03 PM Subject: RE: [R] matched samples, dataframe, panel data Sorry, Something is not ok, because when I do nrow(subset(res,res$dummy==0)) [1] 22 nrow(subset(res,res$dummy==1)) [1] 24 There the number o observatios in the two subsets is not equal. Thanks again, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:43 Para: Cecilia Carmo Cc: R help Assunto: Re: [R] matched samples, dataframe, panel data Hi, Not sure if this is what you wanted. res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) row.names(res)-1:nrow(res) # this combines the list of lists to a data.frame res[1:4,] # firm year industry dummy dimension #1 1 2000 20 0 2120 #2 5 2000 20 1 2189 #3 4 2000 20 0 3178 #4 7 2000 20 1 3245 #or res-do.call(rbind,lapply(lst6,function(x) do.call(rbind,x))) res$group-gsub((.*\\..*)\\..*$,\\1,rownames(res)) row.names(res)-1:nrow(res) res[1:4,] # firm year industry dummy dimension group #1 1 2000 20 0 2120 2000.20 #1 group #2 5 2000 20 1 2189 2000.20 #1 #3 4 2000 20 0 3178 2000.20 #2 #4 7 2000 20 1 3245 2000.20 #2 A.K. - Original Message - From: Cecilia Carmo cecilia.ca...@ua.pt To: arun smartpink...@yahoo.com Cc: Sent: Friday, June 7, 2013 11:33 AM Subject: RE: [R] matched samples, dataframe, panel data Thank you very much. Just a little thing: how can I put it like a dataframe? Thanks, Cecília De: arun [smartpink...@yahoo.com] Enviado: sexta-feira, 7 de Junho de 2013 16:27 Para: Cecilia Carmo Assunto: Re: [R] matched samples, dataframe, panel data Hi, There could be easier ways... I am a bit busy now to try
[R] It seams that fast99 function (sensitivity package) does not work out for norm distribution.
Dear all mailing listers, Does Anyone have the same problem as mine when using the fast99 (extended-FAST method) to perform SA of model with norm distribution inputs? See the simple example given following. Any suggestion will be greatly appreciated. Thank you! Marino # Simple example # 1. uniform version (It works well) library(sensitivity) Myfun-function(x){return(rowSums(x))} SA1 - fast99(model = Myfun, factors = 3, n = 1000, q = qunif, q.arg = list(min = 0, max = 1)) SA1 Call: fast99(model = Myfun, factors = 3, n = 1000, q = qunif, q.arg = list(min = 0, max = 1)) Model runs: 3000 Estimations of the indices: first order total order X1 0.3335243 0.3350584 X2 0.3335243 0.3350584 X3 0.3335243 0.3350584 # 2. norm version (it does not work) SA2 - fast99(model = Myfun, factors = 3, n = 1000, q = qnorm, q.arg = list(mean = 0, sd = 1)) SA2 Call: fast99(model = Myfun, factors = 3, n = 1000, q = qnorm, q.arg = list(mean = 0, sd = 1)) Model runs: 3000 Estimations of the indices: first order total order X1 NA NA X2 NA NA X3 NA NA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trying to install NADA in R 3.0.0
Try contacting Dr Dennis Helsel, the developer at dhel...@practicalstats.com On Fri, Jun 7, 2013 at 1:47 PM, David Doyle kydaviddo...@gmail.com wrote: Hello folks, Im trying to install the NADA package in R 3.0.0 It has been archived so I tried to downloading it and installing it locally. I get utils:::menuInstallLocal() package NADA successfully unpacked and MD5 sums checked But then when I load it I get library(NADA) Error in library(NADA) : NADA is not a valid installed package Any sugestions [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPlus script
On 07/06/13 23:05, Duncan Murdoch wrote: On 13-06-06 6:22 PM, Rolf Turner wrote: On 07/06/13 03:19, Scott Raynaud wrote: I actually had tried placing arguments in the call but it didn't work. However, I did not think about writing it to a variable and printing. That seems to have done the trick. Funny, I don't remember having to do that before, but that's not surprising. If I remember correctly --- haven't used Splus for decades --- this is a difference between Splus and R. In R the output of a function is returned *invisibly* if that function is called from within another function. And source() is one such other function. Actually this depends on the caller. source() does return its results invisibly, but many other functions don't. From FAQ 7.16: If you type '1+1' or 'summary(glm(y~x+z, family=binomial))' at the command line the returned value is automatically printed (unless it is |invisible()|), but in other circumstances, such as in a |source()|d file or ***inside a function*** it isn't printed unless you specifically print it. (Emphasis added.) I think that you have misinterpreted what I wrote. Many (most?) functions *return* their results (values) visibly. But if you put an expression into the code of that function (an expression which is not part of the returned value) you never see the result of evaluating that expression. E.g.: foo - function(x) { sin(42) x^2 } foo(3) [1] 9 The value of sin(42) is never seen. The main point however is that IIRC Splus is different from R in respect of whether the values of (un-assigned) expressions inside source are visible. In R they are invisible; in Splus I *believe* (vaguely recall) that they are visible. I cannot check this since I have no access to Splus. I do wish someone would confirm (or deny, as the case may be) that my recollections about Splus are correct. cheers, Rolf [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trying to install NADA in R 3.0.0
On Fri, 7 Jun 2013, Janh Anni wrote: Try contacting Dr Dennis Helsel, the developer at dhel...@practicalstats.com I don't know what OS David's using, nor if R-3.0.0 is an upgrade or a new installation, but I had no issues when upgrading R from the last 2.x version. The existing NADA in the system library works with 3.x. I'm running Slackware-13.1 and -14.0. Iniially I ran install.packages(NADA), selected the closest repository, and had a working package available for use. Rich -- Richard B. Shepard, Ph.D. | Have knowledge, will travel. Applied Ecosystem Services, Inc. | http://www.appl-ecosys.com Voice: 503-667-4517 Fax: 503-667-8863 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPlus script
foo - function(x) { sin(42) x^2 } foo(3) [1] 9 The value of sin(42) is never seen. This is true in both R and S+. Only the return value of a function is available for autoprinting and sin(42) is not the return value. Perhaps you are remembering the case where the last subexpression in the function is an assignment, in which case S+ autoprints its value but R does not. R f - function(x) { y - x + 1 } R f(10) R .Last.value [1] 11 S+ f - function(x) { y - x + 1 } Warning messages: Last expression in function is an assignment (You probably wanted to return the left-hand side) in: y - x + 1 S+ f(10) [1] 11 S+ .Last.value [1] 11 Autoprinting is implemented quite differently in R and S+. I think R attaches the autoprint flag to the returned object while S+ assigns .Auto.print-TRUE in the frame of the caller (yuck). That leads to differences like the following R f - function(x) invisible(x+1) R g - function(x) 10 * x R g(f(23)) # g() and f() are evaluated in same frame [1] 240 R .Last.value [1] 240 S+ f - function(x) invisible(x+1) S+ g - function(x) 10 * x S+ g(f(23)) # g() and f() are evaluated in same frame S+ .Last.value [1] 240 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rolf Turner Sent: Friday, June 07, 2013 4:17 PM To: Duncan Murdoch Cc: r-help@r-project.org; Scott Raynaud Subject: Re: [R] SPlus script On 07/06/13 23:05, Duncan Murdoch wrote: On 13-06-06 6:22 PM, Rolf Turner wrote: On 07/06/13 03:19, Scott Raynaud wrote: I actually had tried placing arguments in the call but it didn't work. However, I did not think about writing it to a variable and printing. That seems to have done the trick. Funny, I don't remember having to do that before, but that's not surprising. If I remember correctly --- haven't used Splus for decades --- this is a difference between Splus and R. In R the output of a function is returned *invisibly* if that function is called from within another function. And source() is one such other function. Actually this depends on the caller. source() does return its results invisibly, but many other functions don't. From FAQ 7.16: If you type '1+1' or 'summary(glm(y~x+z, family=binomial))' at the command line the returned value is automatically printed (unless it is |invisible()|), but in other circumstances, such as in a |source()|d file or ***inside a function*** it isn't printed unless you specifically print it. (Emphasis added.) I think that you have misinterpreted what I wrote. Many (most?) functions *return* their results (values) visibly. But if you put an expression into the code of that function (an expression which is not part of the returned value) you never see the result of evaluating that expression. E.g.: foo - function(x) { sin(42) x^2 } foo(3) [1] 9 The value of sin(42) is never seen. The main point however is that IIRC Splus is different from R in respect of whether the values of (un-assigned) expressions inside source are visible. In R they are invisible; in Splus I *believe* (vaguely recall) that they are visible. I cannot check this since I have no access to Splus. I do wish someone would confirm (or deny, as the case may be) that my recollections about Splus are correct. cheers, Rolf [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPlus script
Very interesting. But what explicitly happens with ***source()*** in Splus??? cheers, Rolf On 08/06/13 11:33, William Dunlap wrote: foo - function(x) { sin(42) x^2 } foo(3) [1] 9 The value of sin(42) is never seen. This is true in both R and S+. Only the return value of a function is available for autoprinting and sin(42) is not the return value. Perhaps you are remembering the case where the last subexpression in the function is an assignment, in which case S+ autoprints its value but R does not. R f - function(x) { y - x + 1 } R f(10) R .Last.value [1] 11 S+ f - function(x) { y - x + 1 } Warning messages: Last expression in function is an assignment (You probably wanted to return the left-hand side) in: y - x + 1 S+ f(10) [1] 11 S+ .Last.value [1] 11 Autoprinting is implemented quite differently in R and S+. I think R attaches the autoprint flag to the returned object while S+ assigns .Auto.print-TRUE in the frame of the caller (yuck). That leads to differences like the following R f - function(x) invisible(x+1) R g - function(x) 10 * x R g(f(23)) # g() and f() are evaluated in same frame [1] 240 R .Last.value [1] 240 S+ f - function(x) invisible(x+1) S+ g - function(x) 10 * x S+ g(f(23)) # g() and f() are evaluated in same frame S+ .Last.value [1] 240 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rolf Turner Sent: Friday, June 07, 2013 4:17 PM To: Duncan Murdoch Cc: r-help@r-project.org; Scott Raynaud Subject: Re: [R] SPlus script On 07/06/13 23:05, Duncan Murdoch wrote: On 13-06-06 6:22 PM, Rolf Turner wrote: On 07/06/13 03:19, Scott Raynaud wrote: I actually had tried placing arguments in the call but it didn't work. However, I did not think about writing it to a variable and printing. That seems to have done the trick. Funny, I don't remember having to do that before, but that's not surprising. If I remember correctly --- haven't used Splus for decades --- this is a difference between Splus and R. In R the output of a function is returned *invisibly* if that function is called from within another function. And source() is one such other function. Actually this depends on the caller. source() does return its results invisibly, but many other functions don't. From FAQ 7.16: If you type '1+1' or 'summary(glm(y~x+z, family=binomial))' at the command line the returned value is automatically printed (unless it is |invisible()|), but in other circumstances, such as in a |source()|d file or ***inside a function*** it isn't printed unless you specifically print it. (Emphasis added.) I think that you have misinterpreted what I wrote. Many (most?) functions *return* their results (values) visibly. But if you put an expression into the code of that function (an expression which is not part of the returned value) you never see the result of evaluating that expression. E.g.: foo - function(x) { sin(42) x^2 } foo(3) [1] 9 The value of sin(42) is never seen. The main point however is that IIRC Splus is different from R in respect of whether the values of (un-assigned) expressions inside source are visible. In R they are invisible; in Splus I *believe* (vaguely recall) that they are visible. I cannot check this since I have no access to Splus. I do wish someone would confirm (or deny, as the case may be) that my recollections about Splus are correct. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] estimation of covariance matrix of a bivariate normal distribution using maximization of the log-likelihood
See in line below. On 08/06/13 00:00, Hertzog Gladys wrote: Dear all, I’m new in R and I’m trying to estimate the covariance matrix of a bivariate normal distribution by maximizing the log-likelihood. The maximization really has to be performed with the non-linear minimization routine (nlm). Why? The 2 means of the distribution are known and equal to 1. But you simulate your data using means 2.4 and 1.3. I’ve already tried 2 different ways to compute this covariance but for each of them I obtained a lot of warnings and illogical values for the covariance matrix. In the first one, I defined the bivariate normal distribution with the command dmvnorm: x-rnorm(6000, 2.4, 0.6) x - matrix(c(x), ncol=1) Why do you use matrix() and c() here? Totally unnecessary (and confusing). y-rlnorm(6000, 1.3,0.1) y - matrix(c(y), ncol=1) XY - cbind(x,y) To estimate the covariance matrix underlying the data XY you could simply do: M - var(XY). If you want to impose the constraint that the true mean vector is c(1,1) you could do: mu - c(1,1) xbar - apply(XY,2,mean) M - (5999/6000)*var(XY) + (xbar-mu)%*%t(xbar-mu) It would of course make more sense in terms of your simulated data to use: mu - c(2.4,1.3) I haven't looked at your code below any further. Too much of a mess. cheers, Rolf Turner L - function(par,x,y) { return (-sum(log(par[4]*dmvnorm(XY, mean=c(1,1), sigma= matrix(c(par[1], par[1]*par[2]*par[3],par[1]*par[2]*par[3], par[2] ),nrow=2, ncol=2)) ))) } par.start- c(0.5, 0.5 ,0.5 ,0.5) result-nlm(L,par.start,y=y,x=x, hessian=TRUE) par.hat - result$estimate par.hatIl y a eu 32 avis (utilisez warnings() pour les visionner) par.hat - result$estimate par.hat [1] 5.149919e+01 2.520721e+02 8.734212e-03 3.996771e+02 warnings() Messages d'avis : 1: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 2: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 3: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 4: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 5: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 6: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 7: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 8: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 9: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 10: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 11: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN 12: In nlm(L, par.start, y = y, x = x, hessian = TRUE) : NA/Inf replaced by maximum positive value 13: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) : production de NaN In the second one, I wrote step by step the bivariate normal distribution in order to have each parameter separately (not in a matrix) but it didn’t work as well: x-rnorm(6000, 2.4, 0.6) y-rlnorm(6000, 1.3,0.1) L - function(par,x,y) { return (-sum(log((1-par[4])*( (1/(2*pi*par[1]*par[2]*sqrt(1-par[3])))*exp( (-1/2*(1-par[3]^2))* ((y-1)/par[2])^2 +((x-1)/par[1])^2 - 2*(y-1)*(x-1)/(par[2]*par[1]) )) ))) } #par [1]= sigma_x , par [2]= sigma_y par [3]= rho_xy par[4] is a mixing parameter. The final step of my calculation will be to have a mixture of bivariate normal and log-normal distributions. par.start- c(0.5, 0.5 ,0.5 ,0.5) result-nlm(L,par.start,y=y,x=x, hessian=T) par.hat - result$estimate par.ha When I run this script, I get always 50 advices like those below: Messages d'avis : 1: In sqrt(1 - par[3]) : production de NaN 2: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value 3: In sqrt(1 - par[3]) : production de NaN 4: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value 5: In sqrt(1 - par[3]) : production de NaN 6: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value 7: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] * ... : production de NaN 8: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value 9: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] * ... : production de NaN 10: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value 11: In log((1 - par[4]) * ((1/(2 * pi * par[1] * par[2] * ... : production de NaN 12: In nlm(L, par.start, y = y, x = x, hessian = T) : NA/Inf replaced by maximum positive value …… Does one of you know how to use the nlm method to estimate the covariance matrix (and mixing parameter ) of a bivariate normal distribution? Thank you
[R] optim() over bogus parameters
Hello, I'm optimizing a log-likelihood function using the built-in optim() function with the default method. The model is essentially a Weibull distribution where the rate parameter changes based on a single covariate, coded 1 or 0 (that is, when the indicator is 1, the rate parameter changes to a different value, Beta, where this Beta is included as a model parameter). I successfully programmed the model and it seems to work fine. I've done extensive checking, and am confident the model is coded correctly. But I was experimenting, and ran into something odd: if I set the entire covariate vector to zero (meaning Beta is always being multiplied by zero), but leave Beta as a variable to be optimized over, optim() still changes the value of Beta (even though it's always multiplied by zero), and furthermore gives different results for the other parameters, as compared to when I force Beta to be zero and don't optimize over it. When I first saw this, I imagined I had made a coding error somewhere else. BUT, here's the kicker: I modified the code so that the log-likelihood function still includes Beta in the vector of parameters, but draws it (Beta - parameters[4]) and then immediately after that sets it to zero. So essentially, Beta is still in the parameters being optimized, but is immediately set to zero after being called. When I run this model... it does the same thing!! optim() changes the value of Beta, which then changes the values of the other parameters. The ONLY DIFFERENCE between this model and the fixed zero Beta is that the function being optimized still calls Beta from the parameters vector (but, again, sets it to zero immediately--Beta is always zero in any of the calculations). I hope I've explained this clearly. Does anyone have any idea what could be causing this phenomenon? Specifically, it troubles me that the other parameters change. Thanks in advance, Ryan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPlus script
Very interesting. But what explicitly happens with ***source()*** in Splus??? Here are some examples where I use the exprs argument instead of making a tempory file and source the file. I don't know what you are looking for. (The OP never showed any calls to the functions in defined in his script, so I don't know what he was doing either.) S+ source( exprs = expression( 12:15, invisible(1/9), log2(1:10)), auto.print=TRUE) S+ 12:15 [1] 12 13 14 15 S+ invisible(1/9) S+ log2(1:10) [1] 0.00 1.00 1.584963 2.00 2.321928 2.584963 2.807355 3.00 [9] 3.169925 3.321928 S+ source( exprs = expression( 12:15, invisible(1/9), log2(1:10)), auto.print=TRUE, echo=FALSE) [1] 12 13 14 15 [1] 0.00 1.00 1.584963 2.00 2.321928 2.584963 2.807355 3.00 [9] 3.169925 3.321928 S+ source( exprs = expression( { 12:15 ; log2(1:10) } ), auto.print=TRUE) S+ { 12:15 log2(1:10) } [1] 0.00 1.00 1.584963 2.00 2.321928 2.584963 2.807355 3.00 [9] 3.169925 3.321928 The next doesn't autoprint because invisible() sets the autprint flag to FALSE in the caller's frame: S+ source( exprs = expression( { 12:15 ; invisible(1/9) ; log2(1:10) } ), auto.print=TRUE) S+ { 12:15 invisible(1/9) log2(1:10) } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: Rolf Turner [mailto:rolf.tur...@xtra.co.nz] Sent: Friday, June 07, 2013 4:46 PM To: William Dunlap Cc: r-help@r-project.org Subject: Re: [R] SPlus script Very interesting. But what explicitly happens with ***source()*** in Splus??? cheers, Rolf On 08/06/13 11:33, William Dunlap wrote: foo - function(x) { sin(42) x^2 } foo(3) [1] 9 The value of sin(42) is never seen. This is true in both R and S+. Only the return value of a function is available for autoprinting and sin(42) is not the return value. Perhaps you are remembering the case where the last subexpression in the function is an assignment, in which case S+ autoprints its value but R does not. R f - function(x) { y - x + 1 } R f(10) R .Last.value [1] 11 S+ f - function(x) { y - x + 1 } Warning messages: Last expression in function is an assignment (You probably wanted to return the left-hand side) in: y - x + 1 S+ f(10) [1] 11 S+ .Last.value [1] 11 Autoprinting is implemented quite differently in R and S+. I think R attaches the autoprint flag to the returned object while S+ assigns .Auto.print-TRUE in the frame of the caller (yuck). That leads to differences like the following R f - function(x) invisible(x+1) R g - function(x) 10 * x R g(f(23)) # g() and f() are evaluated in same frame [1] 240 R .Last.value [1] 240 S+ f - function(x) invisible(x+1) S+ g - function(x) 10 * x S+ g(f(23)) # g() and f() are evaluated in same frame S+ .Last.value [1] 240 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rolf Turner Sent: Friday, June 07, 2013 4:17 PM To: Duncan Murdoch Cc: r-help@r-project.org; Scott Raynaud Subject: Re: [R] SPlus script On 07/06/13 23:05, Duncan Murdoch wrote: On 13-06-06 6:22 PM, Rolf Turner wrote: On 07/06/13 03:19, Scott Raynaud wrote: I actually had tried placing arguments in the call but it didn't work. However, I did not think about writing it to a variable and printing. That seems to have done the trick. Funny, I don't remember having to do that before, but that's not surprising. If I remember correctly --- haven't used Splus for decades --- this is a difference between Splus and R. In R the output of a function is returned *invisibly* if that function is called from within another function. And source() is one such other function. Actually this depends on the caller. source() does return its results invisibly, but many other functions don't. From FAQ 7.16: If you type '1+1' or 'summary(glm(y~x+z, family=binomial))' at the command line the returned value is automatically printed (unless it is |invisible()|), but in other circumstances, such as in a |source()|d file or ***inside a function*** it isn't printed unless you specifically print it. (Emphasis added.) I think that you have misinterpreted what I wrote. Many (most?) functions *return* their results (values) visibly. But if you put an expression into the code of that function (an expression which is not part of the returned value) you never see the result of evaluating that expression. E.g.: foo - function(x) { sin(42) x^2 }
[R] recode: how to avoid nested ifelse
In our Summer Stats Institute, I was asked a question that amounts to reversing the effect of the contrasts function (reconstruct an ordinal predictor from a set of binary columns). The best I could think of was to link together several ifelse functions, and I don't think I want to do this if the example became any more complicated. I'm unable to remember a less error prone method :). But I expect you might. Here's my working example code ## Paul Johnson pauljohn at ku.edu ## 2013-06-07 ## We need to create an ordinal factor from these indicators ## completed elementary school es - c(0, 0, 1, 0, 1, 0, 1, 1) ## completed high school hs - c(0, 0, 1, 0, 1, 0, 1, 0) ## completed college graduate cg - c(0, 0, 0, 0, 1, 0, 1, 0) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) edf - factor(ed, levels = 0:3, labels = c(none, es, hs, cg)) data.frame(es, hs, cg, ed, edf) ## Looks OK, but what if there are missings? es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA) hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA) cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) cbind(es, hs, cg, ed) ## That's bad, ifelse returns NA too frequently. ## Revise (becoming tedious!) ed - ifelse(!is.na(cg) cg == 1, 3, ifelse(!is.na(hs) hs == 1, 2, ifelse(!is.na(es) es == 1, 1, ifelse(is.na(es), NA, 0 cbind(es, hs, cg, ed) ## Does the project director want us to worry about ## logical inconsistencies, such as es = 0 but cg = 1? ## I hope not. Thanks in advance, I hope you are having a nice summer. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode: how to avoid nested ifelse
Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x - do.call(paste0, list(es, hs, cg)) table(factor(x, levels = c(000, 100, 110, 111), labels = c(none, es,hs, cg))) none es hs cg 4112 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson pauljoh...@gmail.com wrote: In our Summer Stats Institute, I was asked a question that amounts to reversing the effect of the contrasts function (reconstruct an ordinal predictor from a set of binary columns). The best I could think of was to link together several ifelse functions, and I don't think I want to do this if the example became any more complicated. I'm unable to remember a less error prone method :). But I expect you might. Here's my working example code ## Paul Johnson pauljohn at ku.edu ## 2013-06-07 ## We need to create an ordinal factor from these indicators ## completed elementary school es - c(0, 0, 1, 0, 1, 0, 1, 1) ## completed high school hs - c(0, 0, 1, 0, 1, 0, 1, 0) ## completed college graduate cg - c(0, 0, 0, 0, 1, 0, 1, 0) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) edf - factor(ed, levels = 0:3, labels = c(none, es, hs, cg)) data.frame(es, hs, cg, ed, edf) ## Looks OK, but what if there are missings? es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA) hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA) cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) cbind(es, hs, cg, ed) ## That's bad, ifelse returns NA too frequently. ## Revise (becoming tedious!) ed - ifelse(!is.na(cg) cg == 1, 3, ifelse(!is.na(hs) hs == 1, 2, ifelse(!is.na(es) es == 1, 1, ifelse(is.na(es), NA, 0 cbind(es, hs, cg, ed) ## Does the project director want us to worry about ## logical inconsistencies, such as es = 0 but cg = 1? ## I hope not. Thanks in advance, I hope you are having a nice summer. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] splitting a string column into multiple columns faster
Hello! I have a column in my data frame that I have to split: I have to distill the numbers from the text. Below is my example and my solution. x-data.frame(x=c(aaa1_bbb1_ccc3,aaa2_bbb3_ccc2,aaa3_bbb2_ccc1)) x library(stringr) out-as.data.frame(str_split_fixed(x$x,aaa,2)) out2-as.data.frame(str_split_fixed(out$V2,_bbb,2)) out3-as.data.frame(str_split_fixed(out2$V2,_ccc,2)) result-cbind(x,out2[1],out3) result My problem is: str_split.fixed is relatively slow. In my real data frame I have over 80,000 rows so that it takes almost 30 seconds to run just one line (like out-... above) And it's even slower because I have to do it step-by-step many times. Any way to do it by specifying all 3 delimiters at once (aaa,_bbb,_ccc) and then split it in one swoop into a data frame with several columns? Thanks a lot for any pointers! -- Dimitri Liakhovitski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode: how to avoid nested ifelse
I would do this to get the highest non-missing level: x - pmax(3*cg, 2*hs, es, 0, na.rm=TRUE) rock chalk... -nfultz On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote: Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x - do.call(paste0, list(es, hs, cg)) table(factor(x, levels = c(000, 100, 110, 111), labels = c(none, es,hs, cg))) none es hs cg 4112 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson pauljoh...@gmail.com wrote: In our Summer Stats Institute, I was asked a question that amounts to reversing the effect of the contrasts function (reconstruct an ordinal predictor from a set of binary columns). The best I could think of was to link together several ifelse functions, and I don't think I want to do this if the example became any more complicated. I'm unable to remember a less error prone method :). But I expect you might. Here's my working example code ## Paul Johnson pauljohn at ku.edu ## 2013-06-07 ## We need to create an ordinal factor from these indicators ## completed elementary school es - c(0, 0, 1, 0, 1, 0, 1, 1) ## completed high school hs - c(0, 0, 1, 0, 1, 0, 1, 0) ## completed college graduate cg - c(0, 0, 0, 0, 1, 0, 1, 0) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) edf - factor(ed, levels = 0:3, labels = c(none, es, hs, cg)) data.frame(es, hs, cg, ed, edf) ## Looks OK, but what if there are missings? es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA) hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA) cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) cbind(es, hs, cg, ed) ## That's bad, ifelse returns NA too frequently. ## Revise (becoming tedious!) ed - ifelse(!is.na(cg) cg == 1, 3, ifelse(!is.na(hs) hs == 1, 2, ifelse(!is.na(es) es == 1, 1, ifelse(is.na(es), NA, 0 cbind(es, hs, cg, ed) ## Does the project director want us to worry about ## logical inconsistencies, such as es = 0 but cg = 1? ## I hope not. Thanks in advance, I hope you are having a nice summer. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting a string column into multiple columns faster
HI, May be this helps: res-data.frame(x=x,read.table(text=gsub([A-Za-z],,x[,1]),sep=_,header=FALSE),stringsAsFactors=FALSE) res # x V1 V2 V3 #1 aaa1_bbb1_ccc3 1 1 3 #2 aaa2_bbb3_ccc2 2 3 2 #3 aaa3_bbb2_ccc1 3 2 1 A.K. - Original Message - From: Dimitri Liakhovitski dimitri.liakhovit...@gmail.com To: r-help r-help@r-project.org Cc: Sent: Friday, June 7, 2013 9:24 PM Subject: [R] splitting a string column into multiple columns faster Hello! I have a column in my data frame that I have to split: I have to distill the numbers from the text. Below is my example and my solution. x-data.frame(x=c(aaa1_bbb1_ccc3,aaa2_bbb3_ccc2,aaa3_bbb2_ccc1)) x library(stringr) out-as.data.frame(str_split_fixed(x$x,aaa,2)) out2-as.data.frame(str_split_fixed(out$V2,_bbb,2)) out3-as.data.frame(str_split_fixed(out2$V2,_ccc,2)) result-cbind(x,out2[1],out3) result My problem is: str_split.fixed is relatively slow. In my real data frame I have over 80,000 rows so that it takes almost 30 seconds to run just one line (like out-... above) And it's even slower because I have to do it step-by-step many times. Any way to do it by specifying all 3 delimiters at once (aaa,_bbb,_ccc) and then split it in one swoop into a data frame with several columns? Thanks a lot for any pointers! -- Dimitri Liakhovitski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] recode: how to avoid nested ifelse
I still argue for na.rm=FALSE, but that is cute, also substantially faster f1 - function(x1, x2, x3) do.call(paste0, list(x1, x2, x3)) f2 - function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE) f3 - function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3)) f4 - function(x1, x2, x3) rowSums(cbind(x1, x2, x3)) es - rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000) hs - rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000) cg - rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000) system.time(replicate(1000, f1(cg, hs, es))) system.time(replicate(1000, f2(cg, hs, es))) system.time(replicate(1000, f3(cg, hs, es))) system.time(replicate(1000, f4(cg, hs, es))) system.time(replicate(1000, f1(cg, hs, es))) user system elapsed 22.730.03 22.76 system.time(replicate(1000, f2(cg, hs, es))) user system elapsed 0.920.040.95 system.time(replicate(1000, f3(cg, hs, es))) user system elapsed 0.190.020.20 system.time(replicate(1000, f4(cg, hs, es))) user system elapsed 0.950.030.98 R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit) On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz nfu...@gmail.com wrote: I would do this to get the highest non-missing level: x - pmax(3*cg, 2*hs, es, 0, na.rm=TRUE) rock chalk... -nfultz On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote: Hi Paul, Unless you have truly offended the data generating oracle*, the pattern: NA, 1, NA, should be a data entry error --- graduating HS implies graduating ES, no? I would argue fringe cases like that should be corrected in the data, not through coding work arounds. Then you can just do: x - do.call(paste0, list(es, hs, cg)) table(factor(x, levels = c(000, 100, 110, 111), labels = c(none, es,hs, cg))) none es hs cg 4112 Cheers, Josh *Drawn from comments by Judea Pearl one lively session. On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson pauljoh...@gmail.com wrote: In our Summer Stats Institute, I was asked a question that amounts to reversing the effect of the contrasts function (reconstruct an ordinal predictor from a set of binary columns). The best I could think of was to link together several ifelse functions, and I don't think I want to do this if the example became any more complicated. I'm unable to remember a less error prone method :). But I expect you might. Here's my working example code ## Paul Johnson pauljohn at ku.edu ## 2013-06-07 ## We need to create an ordinal factor from these indicators ## completed elementary school es - c(0, 0, 1, 0, 1, 0, 1, 1) ## completed high school hs - c(0, 0, 1, 0, 1, 0, 1, 0) ## completed college graduate cg - c(0, 0, 0, 0, 1, 0, 1, 0) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) edf - factor(ed, levels = 0:3, labels = c(none, es, hs, cg)) data.frame(es, hs, cg, ed, edf) ## Looks OK, but what if there are missings? es - c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA) hs - c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA) cg - c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA) ed - ifelse(cg == 1, 3, ifelse(hs == 1, 2, ifelse(es == 1, 1, 0))) cbind(es, hs, cg, ed) ## That's bad, ifelse returns NA too frequently. ## Revise (becoming tedious!) ed - ifelse(!is.na(cg) cg == 1, 3, ifelse(!is.na(hs) hs == 1, 2, ifelse(!is.na(es) es == 1, 1, ifelse(is.na(es), NA, 0 cbind(es, hs, cg, ed) ## Does the project director want us to worry about ## logical inconsistencies, such as es = 0 but cg = 1? ## I hope not. Thanks in advance, I hope you are having a nice summer. pj -- Paul E. Johnson Professor, Political Science Assoc. Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com
[R] Regression Tolerance Intervals - Dr. Young's Code
Hello, Below is a reproducible example to generate the output by using Dr. Young's R code on the above subject . As commented below, the issue is that part of the code (regtol.int and plottol) does not seem to work. I would appreciate receiving your advice toward resolving the issue. Thanks and regards, Pradip Muhuri setwd (E:/) require (tolerance) d1- xlndur ylnant 8.910797 0.33901690 9.001415 0.36464311 8.983936 0.53976194 8.948035 0.33901690 9.056784 0.39266961 9.018593 0.18617770 9.001415 0.53976194 8.983936 -0.11005034 8.966147 0.53102826 8.948035 0.59885086 6.90 NA xd1 - read.table(textConnection(d1), header=TRUE, as.is=TRUE) print (xd1); str (xd1) #This code works xout1 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1), alpha=.05, P=0.99, side=2) print (xout1) #This code does not work xout2 - regtol.int (reg=lm (formula=ylnant ~ xlndur, data=xd1), new.xlndur = NULL, alpha=.05, P=0.99, side=2) print (xout2) #This code does not work plottol(xout1, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y ) #This code does not work plottol(xout2, x=cbind(1,x), y=y, side=two, x.lab=X, y.lab=Y ) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting a string column into multiple columns faster
HI, Tried it on 1e5 row dataset: l1- letters[1:10] s1-sapply(seq_along(l1),function(i) paste(rep(l1[i],3),collapse=)) set.seed(24) x1-data.frame(x=paste(paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),sep=_),stringsAsFactors=FALSE) system.time(resNew-data.frame(x=x1,read.table(text=gsub([A-Za-z],,x1[,1]),sep=_,header=FALSE),stringsAsFactors=FALSE)) # user system elapsed # 2.712 0.016 2.732 head(resNew) # x V1 V2 V3 #1 ccc12_ggg2_jjj14 12 2 14 #2 ccc7_ddd15_aaa11 7 15 11 #3 hhh12_ddd14_fff12 12 14 12 #4 fff11_bbb15_aaa6 11 15 6 #5 ggg12_ccc9_ggg8 12 9 8 #6 jjj8_eee12_eee4 8 12 4 A.K. - Original Message - From: arun smartpink...@yahoo.com To: Dimitri Liakhovitski dimitri.liakhovit...@gmail.com Cc: R help r-help@r-project.org Sent: Friday, June 7, 2013 11:00 PM Subject: Re: [R] splitting a string column into multiple columns faster HI, May be this helps: res-data.frame(x=x,read.table(text=gsub([A-Za-z],,x[,1]),sep=_,header=FALSE),stringsAsFactors=FALSE) res # x V1 V2 V3 #1 aaa1_bbb1_ccc3 1 1 3 #2 aaa2_bbb3_ccc2 2 3 2 #3 aaa3_bbb2_ccc1 3 2 1 A.K. - Original Message - From: Dimitri Liakhovitski dimitri.liakhovit...@gmail.com To: r-help r-help@r-project.org Cc: Sent: Friday, June 7, 2013 9:24 PM Subject: [R] splitting a string column into multiple columns faster Hello! I have a column in my data frame that I have to split: I have to distill the numbers from the text. Below is my example and my solution. x-data.frame(x=c(aaa1_bbb1_ccc3,aaa2_bbb3_ccc2,aaa3_bbb2_ccc1)) x library(stringr) out-as.data.frame(str_split_fixed(x$x,aaa,2)) out2-as.data.frame(str_split_fixed(out$V2,_bbb,2)) out3-as.data.frame(str_split_fixed(out2$V2,_ccc,2)) result-cbind(x,out2[1],out3) result My problem is: str_split.fixed is relatively slow. In my real data frame I have over 80,000 rows so that it takes almost 30 seconds to run just one line (like out-... above) And it's even slower because I have to do it step-by-step many times. Any way to do it by specifying all 3 delimiters at once (aaa,_bbb,_ccc) and then split it in one swoop into a data frame with several columns? Thanks a lot for any pointers! -- Dimitri Liakhovitski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.