[R] matrix manipulation
This is as much a mathematics as an R question, in the this should be easy but I don't see it category. Assume I have a full rank p by p matrix V (aside: V = (X'X)^{-1} for a particular setup), a p by k matrix B, and I want to complete an orthagonal basis for the space with distance function V. That is, find A such that t(B) %*% V %*% A =0, where A has p rows and p-k columns. With V=identity this is easy. I can do it in 1-2 lines using qr(), lm(), or several other tools. A part of me is quite certain that the general problem isn't more than 3 lines of R, but after a day of beating my head on the issue I still don't see it. Math wise it looks like a simple homework problem in a mid level class, but I'm not currently sure that I'd pass said class. If someone could show the way I would be grateful. Either that or assurance that the problem actually IS hard and I'm not as dense as I think. Terry T. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (ordinal) logistic regression
Dear list, I've been looking at previous posts on the list, but I haven't found any close enough to my question/problem. My data can be seen as a matrix of mutiple individuals (columns) with (rather independent) measures (lines). Now based on supplemental information, the individuals are organized in (multiple) ordered classes. Now I would like to test for which type of measure (ie which line form my matrix of data) the groups are distinct (eg different by group-mean). In other words, I would like to see in which line of my input matrix the measures for the groups of individuals associate to distinct group-values. So I tried glm (or glm2) on each line of the matrix, but in my toy-example (shown below) I'm surprised to get warnings about not finding convergence in the nice toy-cases (ie distinct groups as I am looking for),e even with glm2 ! I see in such nice cases with glm() the Pr(|z|) is close to 1, which in first sight is OK (since: H0 : coefficient =0), but I suppose the test is not really set up right this way. When trying lrm (rms package) I even get an error message (Unable to fit model using “lrm.fit”) with the nice cases. In my real data with 4000 lines of data (ie 4000 glm tests) multiple testing correction would transform everything from 1-p to end up at p=1, so that’s another problem with this approach. I suppose somehow I should transform my data (I don't see where I would change the H0 ?) to obtain low and NOT high p-values (example below) in the case I'm looking for, ie when group-means are distinct. Any suggestions ? Thank’s in advance, Wolfgang Here my toy-example : datB1 - c(12,14:16,18:21,20:22,20,22:24,19.5) # fit partially/overlapping to 3grp model datB2 - c(11:15,21:25,31:36)# too beautiful to be real ... datB3 - c(15:12,11:15,12:14,15:12) # no fit to 3grp model datB4 - c(11:15,15:11,21:26)# grpA == grpB but grpA != grpC datB - rbind(datB1,datB2,datB3,datB4) set.seed(2015) datB - datB + round(runif(length(datB),-0.3,0.3),1) # add some noise datB - datB - rowMeans(datB) # centering ## here the definition of the groups grpB - gl(3,6,labels=LETTERS[1:3])[c(-6,-7)] table(grpB) ## display layout(1:4) for(i in 1:4) plot(datB[i,],as.numeric(grpB)) ## now the 'test' glmLi - function(dat,grp) { ## run glm : predict grp based on dat dat - data.frame(dat=dat,grp=grp) glm(grp ~ dat, data=dat, family=binomial)} logitB - apply(datB,1,glmLi,grpB) lapply(logitB,summary) sapply(logitB,function(x) summary(x)$coefficients[,4]) # lines 1 2 are designed to be 'positive' but give high p-values with convergence problem ## for completness sessionInfo() R version 3.2.0 (2015-04-16) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 [4] LC_NUMERIC=C LC_TIME=French_France.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] glm2_1.1.2 MASS_7.3-40 TinnRcom_1.0.18 formatR_1.2 svSocket_0.9-57 loaded via a namespace (and not attached): [1] tools_3.2.0 svMisc_0.9-70 tcltk_3.2.0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speeding up code?
Here is one improvement. Avoid dataframes in some of these cases. This create a character matrix and then converts to a dataframe after doing the transpose of the matrix. This just takes less than 10 seconds on my system: library(stringr) # create character matrix; avoid dataframes in this case print(proc.time()) user system elapsed 15.525.24 587.70 xm - do.call(rbind, str_split(string = AllpairsTmp, pattern = -)) # convert to dataframe and do transpose on matrix and not dataframe separoPairs - as.data.frame(t(xm), stringsAsFactors = FALSE) print(proc.time() + + ) user system elapsed 20.905.36 596.57 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Jul 16, 2015 at 7:56 AM, Ignacio Martinez ignaci...@gmail.com wrote: Hi Collin, The objective of the gen.names function is to generate N *unique *random names, where N is a *large *number. In my computer `gen.names(n = 5)` takes under a second, so is probably not the root problem in my code. That said, I would love to improve it. I'm not exactly sure how you propose to change it using sample. What is the object that I would be sampling? I would love to run a little benchmark to compare my version with yours. What really takes a long time to run is: separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern = -)) So that and the chunk of code before that is probably where I would get big gains in speed. Sadly, I have no clue how to do it differently Thanks a lot for the help!! Ignacio On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch cfly...@ncsu.edu wrote: Hi Ignacio, If I am reading your code correctly then the top while loop is essentially seeking to select a random set of names from the original set, then using unique to reduce it down, you then iterate until you have built your quota. Ultimately this results in a very inefficient attempt at sampling without replacement. Why not just sample without replacement rather than loop iteratively and use unique? Or if the set of possible names are short enough why not just randomize it and then pull the first n items off? Best, Collin. On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez ignaci...@gmail.com wrote: Hi R-Help! I'm hoping that some of you may give me some tips that could make my code more efficient. More precisely, I would like to make the answer to my stakoverflow http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions question more efficient. This is the code: library(dplyr) library(randomNames) library(geosphere) set.seed(7142015)# Define Parameters n.Schools - 20 first.grade-3 last.grade-5 n.Grades -last.grade-first.grade+1 n.Classrooms - 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE n.Teachers - (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per teacher # Define Random names function: gen.names - function(n, which.names = both, name.order = last.first){ names - unique(randomNames(n=n, which.names = which.names, name.order = name.order)) need - n - length(names) while(need0){ names - unique(c(randomNames(n=need, which.names = which.names, name.order = name.order), names)) need - n - length(names) } return(names)} # Generate n.Schools names gen.schools - function(n.schools) { School.ID - paste0(gen.names(n = n.schools, which.names = last), ' School') School.long - rnorm(n = n.schools, mean = 21.7672, sd = 0.025) School.lat - rnorm(n = n.schools, mean = 58.8471, sd = 0.025) School.RE - rnorm(n = n.schools, mean = 0, sd = 1) Schools - data.frame(School.ID, School.lat, School.long, School.RE) %% mutate(School.ID = as.character(School.ID)) %% rowwise() %% mutate (School.distance = distHaversine( p1 = c(School.long, School.lat), p2 = c(21.7672, 58.8471), r = 3961 )) return(Schools)} Schools - gen.schools(n.schools = n.Schools) # Generate Grades Grades - c(first.grade:last.grade) # Generate n.Classrooms Classrooms - LETTERS[1:n.Classrooms] # Group schools and grades SchGr - outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'), FUN=paste)#head(SchGr) # Group SchGr and Classrooms SchGrClss - outer(SchGr, paste0(Classrooms, '-'), FUN=paste)#head(SchGrClss) # These are the combination of School-Grades-Classroom SchGrClssTmp - as.matrix(SchGrClss, ncol=1, nrow=length(SchGrClss) ) SchGrClssEnd - as.data.frame(SchGrClssTmp) # Assign n.Teachers (2 classroom in a given school-grade) Allpairs - as.data.frame(t(combn(SchGrClssTmp, 2))) AllpairsTmp - paste(Allpairs$V1, Allpairs$V2, sep= ) library(stringr) separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern = -)) separoPairs -
[R] matching strings in a list
Say I have a list: [[1]] I like google [[2]] Hi Google google [[3]] what's up and they are tweets. And I want to find out how many tweets mention google (the answer should be 2). If I string split and unlist them, then I would get the answer of 3. How do I make sure I get just 2? -- View this message in context: http://r.789695.n4.nabble.com/matching-strings-in-a-list-tp4709967.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speeding up code?
Actually looking at the result, you don't need the transpose; that was an artifact of how you were doing it before. xm - do.call(rbind, str_split(string = AllpairsTmp, pattern = -)) # convert to dataframe and do transpose on matrix and not dataframe separoPairs - as.data.frame((xm), stringsAsFactors = FALSE) Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Jul 16, 2015 at 1:37 PM, jim holtman jholt...@gmail.com wrote: Here is one improvement. Avoid dataframes in some of these cases. This create a character matrix and then converts to a dataframe after doing the transpose of the matrix. This just takes less than 10 seconds on my system: library(stringr) # create character matrix; avoid dataframes in this case print(proc.time()) user system elapsed 15.525.24 587.70 xm - do.call(rbind, str_split(string = AllpairsTmp, pattern = -)) # convert to dataframe and do transpose on matrix and not dataframe separoPairs - as.data.frame(t(xm), stringsAsFactors = FALSE) print(proc.time() + + ) user system elapsed 20.905.36 596.57 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Jul 16, 2015 at 7:56 AM, Ignacio Martinez ignaci...@gmail.com wrote: Hi Collin, The objective of the gen.names function is to generate N *unique *random names, where N is a *large *number. In my computer `gen.names(n = 5)` takes under a second, so is probably not the root problem in my code. That said, I would love to improve it. I'm not exactly sure how you propose to change it using sample. What is the object that I would be sampling? I would love to run a little benchmark to compare my version with yours. What really takes a long time to run is: separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern = -)) So that and the chunk of code before that is probably where I would get big gains in speed. Sadly, I have no clue how to do it differently Thanks a lot for the help!! Ignacio On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch cfly...@ncsu.edu wrote: Hi Ignacio, If I am reading your code correctly then the top while loop is essentially seeking to select a random set of names from the original set, then using unique to reduce it down, you then iterate until you have built your quota. Ultimately this results in a very inefficient attempt at sampling without replacement. Why not just sample without replacement rather than loop iteratively and use unique? Or if the set of possible names are short enough why not just randomize it and then pull the first n items off? Best, Collin. On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez ignaci...@gmail.com wrote: Hi R-Help! I'm hoping that some of you may give me some tips that could make my code more efficient. More precisely, I would like to make the answer to my stakoverflow http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions question more efficient. This is the code: library(dplyr) library(randomNames) library(geosphere) set.seed(7142015)# Define Parameters n.Schools - 20 first.grade-3 last.grade-5 n.Grades -last.grade-first.grade+1 n.Classrooms - 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE n.Teachers - (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per teacher # Define Random names function: gen.names - function(n, which.names = both, name.order = last.first){ names - unique(randomNames(n=n, which.names = which.names, name.order = name.order)) need - n - length(names) while(need0){ names - unique(c(randomNames(n=need, which.names = which.names, name.order = name.order), names)) need - n - length(names) } return(names)} # Generate n.Schools names gen.schools - function(n.schools) { School.ID - paste0(gen.names(n = n.schools, which.names = last), ' School') School.long - rnorm(n = n.schools, mean = 21.7672, sd = 0.025) School.lat - rnorm(n = n.schools, mean = 58.8471, sd = 0.025) School.RE - rnorm(n = n.schools, mean = 0, sd = 1) Schools - data.frame(School.ID, School.lat, School.long, School.RE) %% mutate(School.ID = as.character(School.ID)) %% rowwise() %% mutate (School.distance = distHaversine( p1 = c(School.long, School.lat), p2 = c(21.7672, 58.8471), r = 3961 )) return(Schools)} Schools - gen.schools(n.schools = n.Schools) # Generate Grades Grades - c(first.grade:last.grade) # Generate n.Classrooms Classrooms - LETTERS[1:n.Classrooms] # Group schools and grades SchGr - outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'), FUN=paste)#head(SchGr) # Group SchGr and Classrooms
Re: [R] matrix manipulation
Hi Terry, maybe I'm missing something, but why not define a matrix BB = V'B; then t(B) %*% V = t(BB), then your problem reduces to finding A such that t(BB) %*% A = 0? Peter On Thu, Jul 16, 2015 at 10:28 AM, Therneau, Terry M., Ph.D. thern...@mayo.edu wrote: This is as much a mathematics as an R question, in the this should be easy but I don't see it category. Assume I have a full rank p by p matrix V (aside: V = (X'X)^{-1} for a particular setup), a p by k matrix B, and I want to complete an orthagonal basis for the space with distance function V. That is, find A such that t(B) %*% V %*% A =0, where A has p rows and p-k columns. With V=identity this is easy. I can do it in 1-2 lines using qr(), lm(), or several other tools. A part of me is quite certain that the general problem isn't more than 3 lines of R, but after a day of beating my head on the issue I still don't see it. Math wise it looks like a simple homework problem in a mid level class, but I'm not currently sure that I'd pass said class. If someone could show the way I would be grateful. Either that or assurance that the problem actually IS hard and I'm not as dense as I think. Terry T. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching strings in a list
On Jul 16, 2015, at 12:40 PM, tryingtolearn inshi...@ymail.com wrote: Say I have a list: [[1]] I like google [[2]] Hi Google google [[3]] what's up and they are tweets. And I want to find out how many tweets mention google (the answer should be 2). If I string split and unlist them, then I would get the answer of 3. How do I make sure I get just 2? See ?grepl presuming that you just want a count of the matches. If you want it to also be case insensitive, set 'ignore.case = TRUE’. For example: Tweets - list(I like google, Hi Google google, what's up”) Tweets [[1]] [1] I like google [[2]] [1] Hi Google google [[3]] [1] what's up” sapply(Tweets, function(x) grepl(google, x, ignore.case = TRUE)) [1] TRUE TRUE FALSE sum(sapply(Tweets, function(x) grepl(google, x, ignore.case = TRUE))) [1] 2 Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching strings in a list
Why would you strsplit them? I would think length(grep(google, unlist(x), ignore.case = TRUE)) should do it. Best, Ista On Thu, Jul 16, 2015 at 1:40 PM, tryingtolearn inshi...@ymail.com wrote: Say I have a list: [[1]] I like google [[2]] Hi Google google [[3]] what's up and they are tweets. And I want to find out how many tweets mention google (the answer should be 2). If I string split and unlist them, then I would get the answer of 3. How do I make sure I get just 2? -- View this message in context: http://r.789695.n4.nabble.com/matching-strings-in-a-list-tp4709967.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching strings in a list
On Thu, Jul 16, 2015 at 12:40 PM, tryingtolearn inshi...@ymail.com wrote: Say I have a list: [[1]] I like google [[2]] Hi Google google [[3]] what's up and they are tweets. And I want to find out how many tweets mention google (the answer should be 2). If I string split and unlist them, then I would get the answer of 3. How do I make sure I get just 2? NROW(grep(google,list,ignore.case=TRUE)) -- Schrodinger's backup: The condition of any backup is unknown until a restore is attempted. Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be. He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! John McKown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] User defined function within a formula
This might do what you want: OPoly - function(x, degree=1, weight=1, coefs=NULL, rangeX=NULL){ weight - round(weight,0)# weight need to be integer if(length(weight)!=length(x)) { weight - rep(1,length(x)) } if (is.null(rangeX)) { rangeX - range(x) } p - poly(4*(rep(x,weight)-mean(rangeX))/diff(rangeX), degree=degree, coefs=coefs) # why t(t(...))? That strips the attributes. Z - t( t(p[cumsum(weight),]) * sqrt(attr(p,coefs)$norm2[-seq(2)]) )[, degree, drop=FALSE] class(Z) - OPoly attr(Z, coefs) - attr(p, coefs) attr(Z, rangeX) - rangeX Z } makepredictcall.OPoly-function(var,call) { if (is.call(call)) { if (identical(call[[1]], quote(OPoly))) { if (!is.null(tmp - attr(var, coefs))) { call$coefs - tmp } if (!is.null(tmp - attr(var, rangeX))) { call$rangeX - tmp } call$weight - 1 # weight not relevant in predictions } } call } d - data.frame(Y=1:8, X=log(1:8), Weight=1:8) fit - lm(data=d, Y ~ OPoly(X, degree=2, weight=Weight)) predict(fit)[c(3,8)] predict(fit, newdata=data.frame(X=d$X[c(3,8)])) # same result Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jul 16, 2015 at 4:39 PM, William Dunlap wdun...@tibco.com wrote: OPoly-function(x,degree=1,weight=1){ weight=round(weight,0)# weight need to be integer if(length(weight)!=length(x))weight=rep(1,length(x)) p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree) Z-(t(t(p[cumsum(weight),])*sqrt(attr(p,coefs)$norm2[- seq(2)]))[,degree]) class(Z)-OPoly;Z } You need to make OPoly to have optional argument(s) that give the original-regressor-dependent information to OPoly and then have it return, as attributes, the value of those arguments. makepredictcall will take the attributes and attach them to the call in predvars so predict uses values derived from the original regressors, not value derived from the data to be predicted from. Take a look at a pair like makepredictcall.scale() and scale() for an example: scale has optional arguments 'center' and 'scale' that it returns as attributes and makepredictcall.scale adds those to the call to scale that it is given. Thus when you predict, the scale and center arguments come from the original data, not from the data you are predicting from. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jul 16, 2015 at 3:43 PM, Kunshan Yin yinkuns...@gmail.com wrote: Thanks Bill for your quick reply. I tried your solution and it did work for the simple user defined function xploly. But when I try with other function, it gave me error again: OPoly-function(x,degree=1,weight=1){ weight=round(weight,0)# weight need to be integer if(length(weight)!=length(x))weight=rep(1,length(x)) p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree) Z-(t(t(p[cumsum(weight),])*sqrt(attr(p,coefs)$norm2[-seq(2)]))[,degree]) class(Z)-OPoly;Z } ##this OPoly is an FORTRAN orthogonal polynomial routine, it first maps the x to range[-2,2] then do QR, then return the results with sqrt(norm2). Comparing with poly, this transformation will make the model coefficients within a similar range as other variables, the R poly routine will usually give you a very large coefficients. I did not find such routine in R, so I have to define this as user defined function. ### I also have following function as you suggested: makepredictcall.OPoly-function(var,call) { if (is.call(call)) { if (identical(call[[1]], quote(OPoly))) { if (!is.null(tmp - attr(var, coefs))) { call$coefs - tmp } } } call } But I still got error for following: g3=glm(lot1 ~ log(u) + OPoly(u,1), data = clotting, family = Gamma) predict(g3,dc)Error in poly(4 * (rep(x, weight) - mean(range(x)))/diff(range(x)), degree) : missing values are not allowed in 'poly' I thought it might be due to the /diff(range(x) in the function. But even I remove that part, it will still give me error. Any idea? Many thanks in advance. Alex On Thu, Jul 16, 2015 at 2:09 PM, William Dunlap wdun...@tibco.com wrote: Read about the 'makepredictcall' generic function. There is a method, makepredictcall.poly(), for poly() that attaches the polynomial coefficients used during the fitting procedure to the call to poly() that predict() makes. You ought to supply a similar method for your xpoly(), and xpoly() needs to return an object of a a new class that will cause that method to be used. E.g., xpoly - function(x,degree=1,...){ ret - poly(x,degree=degree,...); class(ret) - xpoly ; ret } makepredictcall.xpoly - function (var, call) { if (is.call(call)) { if (identical(call[[1]], quote(xpoly))) { if (!is.null(tmp - attr(var, coefs))) { call$coefs - tmp } } } call } g2 - glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
Re: [R] powerTransform warning message?
I suggest you consult a local statistician. You are (way) over your head statistically here, and statistical matters are off topic on this list. The brief answer to your question is: you are almost certainly producing nonsense. Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Thu, Jul 16, 2015 at 4:35 PM, Brittany Demmitt demmi...@gmail.com wrote: Hello, I have a series of 40 variables that I am trying to transform via the boxcox method using the powerTransfrom function in R. I have no zero values in any of my variables. When I run the powerTransform function on the full data set I get the following warning. Warning message: In sqrt(diag(solve(res$hessian))) : NaNs produced However, when I analyze the variables in groups, rather than all 40 at a time I do not get this warning message. Why would this be? And does this mean this warning is safe to ignore? I would like to add that all of my lambda values are in the -5 to 5 range. I also get different lambda values when I analyze the variables together versus in groups. Is this to be expected? Thank you so much! Brittany [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] powerTransform warning message?
Dear Brittany, On Thu, 16 Jul 2015 17:35:38 -0600 Brittany Demmitt demmi...@gmail.com wrote: Hello, I have a series of 40 variables that I am trying to transform via the boxcox method using the powerTransfrom function in R. I have no zero values in any of my variables. When I run the powerTransform function on the full data set I get the following warning. Warning message: In sqrt(diag(solve(res$hessian))) : NaNs produced However, when I analyze the variables in groups, rather than all 40 at a time I do not get this warning message. Why would this be? And does this mean this warning is safe to ignore? No, it is not safe to ignore the warning, and the problem has nothing to do with non-positive values in the data -- when you say that there are no 0s in the data, I assume that you mean that the data values are all positive. The square-roots of the diagonal entries of the Hessian at the (pseudo-) ML estimates are the SEs of the estimated transformation parameters. If the Hessian can't be inverted, that usually implies that the maximum of the (pseudo-) likelihood isn't well defined. This isn't surprising when you're trying to transform as many as 40 variables at a time to multivariate normality. It's my general experience that people often throw their data into the Box-Cox black box and hope for the best without first examining the data, and, e.g., insuring a reasonable ratio of maximum/minimum values for each variable, checking for extreme outliers, etc. Of course, I don't know that you did that, and it's perfectly possible that you were careful. I would like to add that all of my lambda values are in the -5 to 5 range. I also get different lambda values when I analyze the variables together versus in groups. Is this to be expected? Yes. It's very unlikely that both are right. If, e.g., the variables are multivariate normal within groups then their marginal distribution is a mixture of multivariate normals, which almost surely isn't itself normal. I hope this helps, John John Fox, Professor McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ Thank you so much! Brittany [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] Operaciones entre conjuntos
Estas comparando listas, prueba más bien a[,yld:=mapply(setequal,y,z)] Un saludo. Olivier - Mensaje original - De: Patricio Fuenmayor Viteri patricio.fuenma...@outlook.com Para: r-help-es r-help-es@r-project.org Enviados: Jueves, 16 de Julio 2015 2:48:35 Asunto: [R-es] Operaciones entre conjuntos Hola a todos...Estoy tratando de hacer un trabajo de comparacion de conjuntos y no entiendo que pasa con los resultados.Me explico. Tengo una columna donde se tiene el nombre de una persona, est� ordenado APELLIDOS - NOMBRESa continuaci�n tengo el el nombre de la misma persona, pero ordenado NOMBRES - APELLIDOS.El proceso debe identificar que las 2 columnas son iguales. Estoy usando operaciones entre conjuntos y estructuras data.tableNo entiendo, porque haciendo en data.table la comparacion me sale FALSA, es decir no son iguales, pero si hago la comparaci�n aparte, sale VERDADEROAdjunto el c�digo... gracias por su apoyo... require(data.table)a - data.table( x = 1:2, y = list(c(ANDRES,GERARDO,CABRERA,GUAMAN), c(MONTALVAN,VERA,JORGE,LEONARDO)), z = list(c(CABRERA,GUAMAN,GERARDO,ANDRES), c(JORGE,MONTALVAN,VERA))) a[,:=(vld=setequal(y,z)),by=x] setequal(c(ANDRES,GERARDO,CABRERA,GUAMAN),c(CABRERA,GUAMAN,GERARDO,ANDRES)) [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
[R] problem with wilcox.test()
Dear useRs, I am running a wilcox.test() on two subsets of a dataset and get exactly the same results although the raw data are different in the subsets. mydata - structure(list(cat1 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c(high, low), class = factor), cat2 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c(large, small), class = factor), var1 = c(2.012743, 1.51272, 1.328453, 1.2609935, 1.617757, 1.8175455, 1.890035, 2.3652205, 1.295888, 1.5985145, 1.081813, 1.856733, 2.366358, 2.27421, 1.727023, 2.230433, 5.272843, 3.7626355), var2 = c(0.00196, 0.0066545, 0.006188, 0.0058985, 0.004453, 0.005468, 0.003773, 0.004742, 0.007525, 0.0081235, 0.004611, 0.0050475, 0.006643, 0.0097335, 0.009213, 0.0049525, 0.006243, 0.006021)), .Names = c(cat1, cat2, var1, var2), row.names = c(NA, 18L), class = data.frame) #p-values are identical but W different for the first variable wilcox.test(var1~cat1, data=mydata[mydata$cat2==large,]) wilcox.test(var1~cat1, data=mydata[mydata$cat2==small,]) #both p-values and W are identical for the second variable wilcox.test(var2~cat1, data=mydata[mydata$cat2==large,]) wilcox.test(var2~cat1, data=mydata[mydata$cat2==small,]) Did I do something wrong or does it just have something to do with my dataset? Or is it just a coincidence? Thank you in advance for your help, Ivan -- Ivan Calandra, ATER University of Reims Champagne-Ardenne GEGENAA - EA 3795 CREA - 2 esplanade Roland Garros 51100 Reims, France +33(0)3 26 77 36 89 ivan.calan...@univ-reims.fr https://www.researchgate.net/profile/Ivan_Calandra __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cronbachs alpha and missing values
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and http://adv-r.had.co.nz/Reproducibility.html Currently we don't even know what version of Cronbach's alpha you are using. Also use google as well as help() Try R statistics Cronbach's alpha John Kane Kingston ON Canada -Original Message- From: penv...@uni-hamburg.de Sent: Wed, 15 Jul 2015 01:50:46 -0700 (PDT) To: r-help@r-project.org Subject: [R] cronbachs alpha and missing values i want to calculate cronbachs alpha for my df. my df has some missing values so that there are only 23 out of 56 complete cases. if i run alpha on only the complete cases, i get a value of .79 and if i run it on the whole df, I get .82. My question is: what does alpha do with those missing values, if i include the incomplete cases? are they imputed in some way? -- View this message in context: http://r.789695.n4.nabble.com/cronbachs-alpha-and-missing-values-tp4709885.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Can't remember your password? Do you need a strong and secure password? Use Password manager! It stores your passwords protects your account. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speeding up code?
Hi Collin, The objective of the gen.names function is to generate N *unique *random names, where N is a *large *number. In my computer `gen.names(n = 5)` takes under a second, so is probably not the root problem in my code. That said, I would love to improve it. I'm not exactly sure how you propose to change it using sample. What is the object that I would be sampling? I would love to run a little benchmark to compare my version with yours. What really takes a long time to run is: separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern = -)) So that and the chunk of code before that is probably where I would get big gains in speed. Sadly, I have no clue how to do it differently Thanks a lot for the help!! Ignacio On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch cfly...@ncsu.edu wrote: Hi Ignacio, If I am reading your code correctly then the top while loop is essentially seeking to select a random set of names from the original set, then using unique to reduce it down, you then iterate until you have built your quota. Ultimately this results in a very inefficient attempt at sampling without replacement. Why not just sample without replacement rather than loop iteratively and use unique? Or if the set of possible names are short enough why not just randomize it and then pull the first n items off? Best, Collin. On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez ignaci...@gmail.com wrote: Hi R-Help! I'm hoping that some of you may give me some tips that could make my code more efficient. More precisely, I would like to make the answer to my stakoverflow http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions question more efficient. This is the code: library(dplyr) library(randomNames) library(geosphere) set.seed(7142015)# Define Parameters n.Schools - 20 first.grade-3 last.grade-5 n.Grades -last.grade-first.grade+1 n.Classrooms - 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE n.Teachers - (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per teacher # Define Random names function: gen.names - function(n, which.names = both, name.order = last.first){ names - unique(randomNames(n=n, which.names = which.names, name.order = name.order)) need - n - length(names) while(need0){ names - unique(c(randomNames(n=need, which.names = which.names, name.order = name.order), names)) need - n - length(names) } return(names)} # Generate n.Schools names gen.schools - function(n.schools) { School.ID - paste0(gen.names(n = n.schools, which.names = last), ' School') School.long - rnorm(n = n.schools, mean = 21.7672, sd = 0.025) School.lat - rnorm(n = n.schools, mean = 58.8471, sd = 0.025) School.RE - rnorm(n = n.schools, mean = 0, sd = 1) Schools - data.frame(School.ID, School.lat, School.long, School.RE) %% mutate(School.ID = as.character(School.ID)) %% rowwise() %% mutate (School.distance = distHaversine( p1 = c(School.long, School.lat), p2 = c(21.7672, 58.8471), r = 3961 )) return(Schools)} Schools - gen.schools(n.schools = n.Schools) # Generate Grades Grades - c(first.grade:last.grade) # Generate n.Classrooms Classrooms - LETTERS[1:n.Classrooms] # Group schools and grades SchGr - outer(paste0(Schools$School.ID, '-'), paste0(Grades, '-'), FUN=paste)#head(SchGr) # Group SchGr and Classrooms SchGrClss - outer(SchGr, paste0(Classrooms, '-'), FUN=paste)#head(SchGrClss) # These are the combination of School-Grades-Classroom SchGrClssTmp - as.matrix(SchGrClss, ncol=1, nrow=length(SchGrClss) ) SchGrClssEnd - as.data.frame(SchGrClssTmp) # Assign n.Teachers (2 classroom in a given school-grade) Allpairs - as.data.frame(t(combn(SchGrClssTmp, 2))) AllpairsTmp - paste(Allpairs$V1, Allpairs$V2, sep= ) library(stringr) separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern = -)) separoPairs - as.data.frame(t(separoPairs)) row.names(separoPairs) - NULL separoPairs - separoPairs %% select(-V7) %% #Drops empty column mutate(V1=as.character(V1), V4=as.character(V4), V2=as.numeric(V2), V5=as.numeric(V5)) %% mutate(V4 = trimws(V4, which = both)) separoPairs[120,]$V4#Only the rows with V1=V4 and V2=V5 are valid validPairs - separoPairs %% filter(V1==V4 V2==V5) %% select(V1, V2, V3, V6) # Generate n.Teachers gen.teachers - function(n.teachers){ Teacher.ID - gen.names(n = n.teachers, name.order = last.first) Teacher.exp - runif(n = n.teachers, min = 1, max = 30) Teacher.Other - sample(c(0,1), replace = T, prob = c(0.5, 0.5), size = n.teachers) Teacher.RE - rnorm(n = n.teachers, mean = 0, sd = 1) Teachers - data.frame(Teacher.ID, Teacher.exp, Teacher.Other, Teacher.RE) return(Teachers)} Teachers - gen.teachers(n.teachers = n.Teachers) %% mutate(Teacher.ID = as.character(Teacher.ID)) # Randomly assign n.Teachers teachers to the ValidPairs
Re: [R] Problem Regarding R Packages installation
the first error message is configuration failed for package RCurl immediately before that it said Cannot find curl-config. It would appear that you don't have curl installed on your computer. Unfortunately, I cannot help you with that. But you might look at the documentation for RCurl (available from CRAN) to start trying to find where to get curl. -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 7/15/15, 8:23 PM, R-help on behalf of Muhammad Sohail Raza r-help-boun...@r-project.org on behalf of muhammadsohailr...@live.com wrote: Hi Everyone! I am trying to install R Package VariantAnnotation. I entered commands: source(http://bioconductor.org/biocLite.R;) biocLite(VariantAnnotation) But i am getting the following errors, ERROR: configuration failed for package �XML� * removing �/usr/lib64/R/library/XML� * installing *source* package �RCurl� ... ** package �RCurl� successfully unpacked and MD5 sums checked configure: loading site script /usr/share/site/x86_64-unknown-linux-gnu checking for curl-config... no Cannot find curl-config ERROR: configuration failed for package �RCurl� * removing �/usr/lib64/R/library/RCurl� * installing *source* package �Rsamtools� ... ** libs gcc -std=gnu99 -I/usr/lib64/R/include -DNDEBUG -I/usr/local/include -I/usr/lib64/R/library/S4Vectors/include -I/usr/lib64/R/library/IRanges/include -I/usr/lib64/R/library/XVector/include -I/usr/lib64/R/library/Biostrings/include -fopenmp -D_USE_KNETFILE -D_FILE_OFFSET_BITS=64 -U_FORTIFY_SOURCE -DBGZF_CACHE -Dfprintf=_samtools_fprintf -Dexit=_samtools_exit -Dabort=_samtools_abort -I./samtools -I./samtools/bcftools -I./tabix -fpic -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -c Biostrings_stubs.c -o Biostrings_stubs.o gcc -std=gnu99 -I/usr/lib64/R/include -DNDEBUG -I/usr/local/include -I/usr/lib64/R/library/S4Vectors/include -I/usr/lib64/R/library/IRanges/include -I/usr/lib64/R/library/XVector/include -I/usr/lib64/R/library/Biostrings/include -fopenmp -D_USE_KNETFILE -D_FILE_OFFSET_BITS=64 -U_FORTIFY_SOURCE -DBGZF_CACHE -Dfprintf=_samtools_fprintf -Dexit=_samtools_exit -Dabort=_samtools_abort -I./samtools -I./samtools/bcftools -I./tabix -fpic -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -c IRanges_stubs.c -o IRanges_stubs.o g++ -I/usr/lib64/R/include -DNDEBUG -I/usr/local/include -I/usr/lib64/R/library/S4Vectors/include -I/usr/lib64/R/library/IRanges/include -I/usr/lib64/R/library/XVector/include -I/usr/lib64/R/library/Biostrings/include -fpic -fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -c PileupBuffer.cpp -o PileupBuffer.o /bin/sh: g++: command not found /usr/lib64/R/etc/Makeconf:143: recipe for target 'PileupBuffer.o' failed make: *** [PileupBuffer.o] Error 127 ERROR: compilation failed for package �Rsamtools� * removing �/usr/lib64/R/library/Rsamtools� * installing *source* package �futile.logger� ... ** package �futile.logger� successfully unpacked and MD5 sums checked ** R ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded * DONE (futile.logger) ERROR: dependencies �XML�, �RCurl� are not available for package �biomaRt� * removing �/usr/lib64/R/library/biomaRt� * installing *source* package �BiocParallel� ... ** R ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** installing vignettes ** testing if installed package can be loaded * DONE (BiocParallel) ERROR: dependency �Rsamtools� is not available for package �GenomicAlignments� * removing �/usr/lib64/R/library/GenomicAlignments� ERROR: dependencies �XML�, �RCurl�, �Rsamtools�, �GenomicAlignments� are not available for package �rtracklayer� * removing �/usr/lib64/R/library/rtracklayer� ERROR: dependencies �rtracklayer�, �Rsamtools� are not available for package �BSgenome� * removing �/usr/lib64/R/library/BSgenome� ERROR: dependencies �rtracklayer�, �biomaRt�, �RCurl� are not available for package �GenomicFeatures� * removing �/usr/lib64/R/library/GenomicFeatures� ERROR: dependencies �Rsamtools�, �BSgenome�, �rtracklayer�, �GenomicFeatures� are not available for package �VariantAnnotation� * removing �/usr/lib64/R/library/VariantAnnotation� The downloaded source packages are in �/tmp/RtmpvpoYjO/downloaded_packages� Updating HTML index of packages in '.Library' Making 'packages.html' ... done Warning messages: 1: In install.packages(pkgs = doing, lib = lib, ...) : installation of package �XML� had non-zero exit status 2: In install.packages(pkgs = doing, lib = lib, ...) : installation of package �RCurl� had non-zero exit status 3: In install.packages(pkgs = doing,
Re: [R] problem with wilcox.test()
On 16 Jul 2015, at 15:13 , Ivan Calandra ivan.calan...@univ-reims.fr wrote: Dear useRs, I am running a wilcox.test() on two subsets of a dataset and get exactly the same results although the raw data are different in the subsets. mydata - structure(list(cat1 = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c(high, low), class = factor), cat2 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c(large, small), class = factor), var1 = c(2.012743, 1.51272, 1.328453, 1.2609935, 1.617757, 1.8175455, 1.890035, 2.3652205, 1.295888, 1.5985145, 1.081813, 1.856733, 2.366358, 2.27421, 1.727023, 2.230433, 5.272843, 3.7626355), var2 = c(0.00196, 0.0066545, 0.006188, 0.0058985, 0.004453, 0.005468, 0.003773, 0.004742, 0.007525, 0.0081235, 0.004611, 0.0050475, 0.006643, 0.0097335, 0.009213, 0.0049525, 0.006243, 0.006021)), .Names = c(cat1, cat2, var1, var2), row.names = c(NA, 18L), class = data.frame) #p-values are identical but W different for the first variable wilcox.test(var1~cat1, data=mydata[mydata$cat2==large,]) wilcox.test(var1~cat1, data=mydata[mydata$cat2==small,]) #both p-values and W are identical for the second variable wilcox.test(var2~cat1, data=mydata[mydata$cat2==large,]) wilcox.test(var2~cat1, data=mydata[mydata$cat2==small,]) Did I do something wrong or does it just have something to do with my dataset? Or is it just a coincidence? Coincidence, mostly, I think: You have table(mydata[mydata$cat2==small,cat1]) high low 45 table(mydata[mydata$cat2==large,cat1]) high low 45 and all of your response variables' values are distinct. In both cases, the null distribution of the rank sum W is that of (sum(sample(1:9,4))-sum(1:4)) which is a distribution on 0:20, symmetric around 10. Hence there are only 11 different p-values possible, so it is not particularly odd that you may get the same one twice. Thank you in advance for your help, Ivan -- Ivan Calandra, ATER University of Reims Champagne-Ardenne GEGENAA - EA 3795 CREA - 2 esplanade Roland Garros 51100 Reims, France +33(0)3 26 77 36 89 ivan.calan...@univ-reims.fr https://www.researchgate.net/profile/Ivan_Calandra __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
I am trying to analyse time-series .netcdf (3D lat,long and time domain) climate data. I want to apply the SPEI package (calculation of standardized precipitation evapotranspiration index) on it. But unable to arrange my data in the required data frame. As I am a beginner in R, it will be very much helpful if someone provide me the details of the code to be written before executing the package. The details of SPEI proggrame is as follows: spei(data, scale, kernel = list(type = 'rectangular', shift = 0), distribution = 'log-Logistic', fit = 'ub-pwm', na.rm = FALSE, ref.start=NULL, ref.end=NULL, x=FALSE, params=NULL, ...) Thanks in advance. -- Prabir Kumar Das Scientist/Engineer - SD RRSC - East NRSC, ISRO Dept. of Space, Govt. of India Kolkata [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R 3.2.0 - Windows 8, 1 - SSL certificate problem installing course in swirl
Hello, I try to install a course for swirl and got a SSL problem: install_from_swirl(R Programming) Error in function (type, msg, asError = TRUE) : SSL certificate problem: self signed certificate in certificate chain Found this answer with Google but does not work either: set_config( config( ssl.verifypeer = 0L ) ) Error: could not find function set_config What do I do to avoid or accept the SSL certificate? Kind regards, Martin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls singular gradient matrix - fit parameters in integral's upper limits
The list rejects almost all attachments. You could dput the data and put it in your posting. You may also want to try a Marquardt solver. In R from my nlmrt or compiled in Kate Mullen's minpack.lm. They are slightly different in flavour and the call is a bit different from nls. JN On 15-07-16 08:21 AM, Laura Teresa Corredor Bohórquez wrote: ---The las post rejected two files I had attached, so I modified it.--- Hi. I am trying to make a nls fit for a little bit complicated expression that includes two integrals with two of the fit parameters in their upper limits. I got the error Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates. First of all, I have searched already in the previous answers, but didn´t help. The parameters initialization seems to be ok, I have tried to change the parameters but no one works. If my function has just one integral everything works very nicely, but when adding a second integral term just got the error. I don´t believe the function is over-parametrized, as I have performed other fits with much more parameters and they worked. I did try to enclose the data but the attachment was rejected. The minimal example is the following: # read the data from a csv file dados = read.csv(file.csv, header=FALSE, stringsAsFactors=FALSE) x = 0*(1:97) y = 0*(1:97) for(i in 1:97){ x[i] = dados[i,1] y[i] = dados[i,2] } integrand - function(X) { return(X^4/(2*sinh(X/2))^2) } fitting = function(T1, T2, N, D, x){ int1 = integrate(integrand, lower=0, upper = T1)$value int2 = integrate(integrand, lower=0, upper = T2)$value return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2 )+(448.956*(x/T1)^3*int1)+(299.304*(x/T2)^3*int2)) } fit = nls(y ~ fitting(T1, T2, N, D, x), start=list(T1=400,T2=200,N=0.01,D=2)) --For reference, the fit that worked is the following: # read the data from a csv file dados = read.csv(file.csv, header=FALSE, stringsAsFactors=FALSE) x = 0*(1:97) y = 0*(1:97) for(i in 1:97){ x[i] = dados[i,1] y[i] = dados[i,2] } integrand - function(X) { return(X^4/(2*sinh(X/2))^2) } fitting = function(T1, N, D, x){ int = integrate(integrand, lower=0, upper = T1)$value return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2 )+(748.26)*(x/T1)^3*int) } fit = nls(y ~ fitting(T1 , N, D, x), start=list(T1=400,N=0.01,D=2)) I cannot figure out what happen. I need to perform this fit for three integral components, but even for two I have this problem. I appreciate so much your help. Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nls singular gradient matrix - fit parameters in integral's upper limits
---The las post rejected two files I had attached, so I modified it.--- Hi. I am trying to make a nls fit for a little bit complicated expression that includes two integrals with two of the fit parameters in their upper limits. I got the error Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates. First of all, I have searched already in the previous answers, but didn´t help. The parameters initialization seems to be ok, I have tried to change the parameters but no one works. If my function has just one integral everything works very nicely, but when adding a second integral term just got the error. I don´t believe the function is over-parametrized, as I have performed other fits with much more parameters and they worked. I did try to enclose the data but the attachment was rejected. The minimal example is the following: # read the data from a csv file dados = read.csv(file.csv, header=FALSE, stringsAsFactors=FALSE) x = 0*(1:97) y = 0*(1:97) for(i in 1:97){ x[i] = dados[i,1] y[i] = dados[i,2] } integrand - function(X) { return(X^4/(2*sinh(X/2))^2) } fitting = function(T1, T2, N, D, x){ int1 = integrate(integrand, lower=0, upper = T1)$value int2 = integrate(integrand, lower=0, upper = T2)$value return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2 )+(448.956*(x/T1)^3*int1)+(299.304*(x/T2)^3*int2)) } fit = nls(y ~ fitting(T1, T2, N, D, x), start=list(T1=400,T2=200,N=0.01,D=2)) --For reference, the fit that worked is the following: # read the data from a csv file dados = read.csv(file.csv, header=FALSE, stringsAsFactors=FALSE) x = 0*(1:97) y = 0*(1:97) for(i in 1:97){ x[i] = dados[i,1] y[i] = dados[i,2] } integrand - function(X) { return(X^4/(2*sinh(X/2))^2) } fitting = function(T1, N, D, x){ int = integrate(integrand, lower=0, upper = T1)$value return(N*(D/x)^2*(exp(D/x)/(1+exp(D/x))^2 )+(748.26)*(x/T1)^3*int) } fit = nls(y ~ fitting(T1 , N, D, x), start=list(T1=400,N=0.01,D=2)) I cannot figure out what happen. I need to perform this fit for three integral components, but even for two I have this problem. I appreciate so much your help. Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Weighted skewness and curtosis
Is there an R package that allows one to calculate skewness and curtosis - but weighted with individual level weights (one weight per observation)? Thank you! -- Dimitri Liakhovitski __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] removing the columns with 0 or NA or 1or NA or 2 or NA
I have ma matrix which its elements are NA,0,1,2 ! I got my answer bout removing the columns with 0 or NA or both values but now I want to add additional condition for deleting the columns! I have to delete the columns which contain the same value. delete the columns with NA or 0 or both and the columns with NA or 1 or both and the column with NA or 2 or both (I should keep the columns which have variation in their values)! I use this code but didn't work properly: mat_nonNA- mat[, !apply((is.na(mat) | mat == 0) (is.na(mat) | mat==1) ( is.na(mat) | mat==2), 2, all)] mat 1:1105901701:110888172 1:110906406 1:110993854 1:110996710 1:44756 A05363 1 1 1 2 NA 0 A05370 0 1 0NA 0 NA A05380 1 NA 2NA NA 0 A05397 01 0NA 0 2 A05400 21 0 2 0 0 A05426 0 NA NA NA 0 1 my out put should be like below: 1:110590170 1:110906406 1:44756 A05363 1 1 0 A05370 0 0 NA A05380 1 2 0 A05397 0 0 2 A05400 2 0 0 A05426 0 NA 1 Thanks for your help [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R 3.2.0 - Windows 8, 1 - SSL certificate problem installing course in swirl
On 16/07/2015 3:08 AM, MH wrote: Hello, I try to install a course for swirl and got a SSL problem: install_from_swirl(R Programming) Error in function (type, msg, asError = TRUE) : SSL certificate problem: self signed certificate in certificate chain Found this answer with Google but does not work either: set_config( config( ssl.verifypeer = 0L ) ) Error: could not find function set_config What do I do to avoid or accept the SSL certificate? We don't know what install_from_swirl is doing, so it's hard to say. You should ask its author. If it is using the wininet method to download from an https site, then it is Windows that is issuing the error: you may be able to configure Internet Explorer to avoid it. Ask Microsoft how. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted skewness and curtosis
Unfortunately not - more like 0.7654, 1.2345. I understand that I could multiply each number by 100, round it to no decimal point and then unroll my data in proportion. I was just hoping someone has done it in C and put it into a package... On Thu, Jul 16, 2015 at 11:27 AM, David Winsemius dwinsem...@comcast.net wrote: On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote: Is there an R package that allows one to calculate skewness and curtosis - but weighted with individual level weights (one weight per observation)? Integer weights? -- David Winsemius Alameda, CA, USA -- Dimitri Liakhovitski __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted skewness and curtosis
rep() **does** do it essentially in C !! See also the moments package, which I found instantly by googling sample moments in R, though I don't know whether it does what you want (but probably shouldn't do). Of course, sample skewness and kurtosis are basically useless, but that's another, off topic, issue. Cheers, Bert Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Thu, Jul 16, 2015 at 8:37 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Unfortunately not - more like 0.7654, 1.2345. I understand that I could multiply each number by 100, round it to no decimal point and then unroll my data in proportion. I was just hoping someone has done it in C and put it into a package... On Thu, Jul 16, 2015 at 11:27 AM, David Winsemius dwinsem...@comcast.net wrote: On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote: Is there an R package that allows one to calculate skewness and curtosis - but weighted with individual level weights (one weight per observation)? Integer weights? -- David Winsemius Alameda, CA, USA -- Dimitri Liakhovitski __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted skewness and curtosis
On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote: Is there an R package that allows one to calculate skewness and curtosis - but weighted with individual level weights (one weight per observation)? Integer weights? -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NLOPTR
Hello, I have been running a nonlinear GMM using the nloptr wrapper since the last 7 days. The maximum time I have to run the code on the server that I am using to run this code is 7 days which expires in about an hour when the server automatically terminates it. I will like to know if there is a way that I can obtain the estimates at the point of termination so that I can use these estimates as the new starting values for another round of estimation. Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted skewness and curtosis
David: I think you missed my point. As I understand him, Dmitri mentioned rounding off to e.g. 2 decimal digits and multiplying by 100 to produce integer weights, which would then lead to unrolling the vector via rep(). Your weighted moments reference is probably closer to what he sought, however. Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Thu, Jul 16, 2015 at 9:47 AM, David Winsemius dwinsem...@comcast.net wrote: On Jul 16, 2015, at 8:37 AM, Dimitri Liakhovitski wrote: Unfortunately not - more like 0.7654, 1.2345. I understand that I could multiply each number by 100, round it to no decimal point and then unroll my data in proportion. I was just hoping someone has done it in C and put it into a package... I saw Bert's comments but I don't think that `rep` handles fractional weights. Take a look at Hmisc::wtd.var (since I have that package loaded and see that code is easy to grasp). Also appears after after using sos::findFn(weighted moments) that the package:lmomco deserves your attention. On Thu, Jul 16, 2015 at 11:27 AM, David Winsemius dwinsem...@comcast.net wrote: On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote: Is there an R package that allows one to calculate skewness and curtosis - but weighted with individual level weights (one weight per observation)? Integer weights? -- David Winsemius David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] Conservar el nombre de la variable entre varias funciones: ejemplos de resultados: solucion defmacro
Hola: He probado otra solución: con defmacro del paquete gtools: = DATOS - data.frame(SE=c(M, H, M, M, H), ED=c(50, 60, 20, 18, 30), GRP=c(B, B, A, A, B)) library(gtools) MRL - defmacro(XDADES, XVD, XVI, expr = { RL - glm(XVD ~ XVI, family=binomial, data=XDADES) print(summary(RL)) }) = Si se ejecuta aparecen los nombres reales de las variables y no el nombre de los argumentos: = MRL(XDADES=DATOS, XVD=SE, XVI=ED) Call: glm(formula = SE ~ ED, family = binomial, data = DATOS) Deviance Residuals: 12345 1,3511 -0,7869 0,6512 0,6159 -1,5435 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) 2,671702,60538 1,0250,305 ED -0,061420,06326 -0,9710,332 [borrado] = En cambio en forma de función en los resultados aparece el nombre de los argumentos: = FRL - function(XVD, XVI) { RL - glm(XVD ~ XVI, family=binomial) print(summary(RL)) } FRL(XVD=SE, XVI=ED) Call: glm(formula = XVD ~ XVI, family = binomial) Deviance Residuals: 12345 1,3511 -0,7869 0,6512 0,6159 -1,5435 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) 2,671702,60538 1,0250,305 XVI -0,061420,06326 -0,9710,332 [borrado] = Saludos! On Wed, 15 Jul 2015 20:26:46 +0200 Griera gri...@yandex.com wrote: Hola: On Wed, 15 Jul 2015 00:18:32 +0200 Carlos Ortega c...@qualityexcellence.es wrote: [borro] Sobre la duda de los nombres, si le pasas el data.frame tal cual, te debiera de conservar los nombres. Tienes razón. Ahora le paso el nombre del data.frame, y ya muestra el nombre de la variable analizada. Muchas gracias por la sugerencia. Saludos! Si no es así, pásale como argumento adicional a las funciones los nombres de las columnas/variables... Saludos, Carlos. El 14 de julio de 2015, 22:49, Griera gri...@yandex.com escribió: Hola Carlos: Te adjunto un ejemplo de aplicación: las funciones (he borrado los path de las funciones y las ordenes source() que las carga ) y un ejemplo para ejecutarlas para las opciones que tengo implementadas con la tabla de datos birthwt del paqueteMASS: - Descriptiva de todas las variables de una tabla. - Análisis univariado de todas las variables de una tabla cruzadas con una variable dependiente cualitativa. =Inicio funciones ##-- ## DESUNI ##-- DESUNI = function(XDADES, XDROP=NULL, XVD=NULL, XSPV=NULL # Si és una anàlisi de SPV # Pot tenir el valor TRUE ) { options(digits = 3, OutDec=,, scipen=999) ## No existeix VD: descriptiva if(is.null(XVD)) # No existeix VD: descriptiva { cat(\n*** Descriptiva (no existeix variable dependent)\n) DES(XDADES=XDADES, XDROP=XDROP, XCAMIF=XCAMIF) } ## Existeis VD: anàlisi univariat else # Existeis VD: anàlisi univariat { UNI(XDADES=XDADES, XDROP=XDROP, XVD=XVD, XSPV=XSPV, XCAMIF=XCAMIF) } } ##-- ## DES: Descriptiva de todas las variables ##-- DES = function(XDADES, XDROP=NULL, XCAMIF) { ifelse(is.null(XDROP), DADES_S - XDADES, DADES_S - XDADES[, setdiff(names(XDADES), XDROP) ]) # setdiff Selecciona les variables de XDADES que són diferents de XDROP attach(DADES_S, warn.conflicts = F) XVARLLI=names(DADES_S) for (XVARNOM in names(DADES_S)) { if(is.numeric(get(XVARNOM))) { DES_QUANTI (XVARNOM) } else if(is.factor(get(XVARNOM))) { DES_QUALI (XVARNOM) } else { cat(La variable , XVARNOM, no és de cap dels tipus coneguts, \n) } } # Fi de la funció detach(DADES_S) } ##-- ## DES_QUANTI: Descriptiva variables factor ##-- DES_QUANTI - function(X) { OP - par(no.readonly = TRUE); # save old parameters par(mfrow=c(1,3)) hist(get(X),main=c(Histograma de, X), xlab=X);rug(get(X)) boxplot(get(X), main=c(Diagrama de caixa de, X),
Re: [R] NLOPTR
1. I have no idea. 2. However, I doubt that your strategy would work anyway. If there is not an outright error, you are probably stuck in some endless loop or are wandering around at random on an essentially flat hypersurface. You would need to change convergence criteria or change the parameterization of and/or simplify your model before proceeding if that is the case. Although I would have thought you would have run up against some sort of maximum iterations limit... Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Thu, Jul 16, 2015 at 10:11 AM, Olu Ola via R-help r-help@r-project.org wrote: Hello, I have been running a nonlinear GMM using the nloptr wrapper since the last 7 days. The maximum time I have to run the code on the server that I am using to run this code is 7 days which expires in about an hour when the server automatically terminates it. I will like to know if there is a way that I can obtain the estimates at the point of termination so that I can use these estimates as the new starting values for another round of estimation. Thank you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rgl 3d surface
-Original Message- I compute its regression surface doing polynomical regression (fit) ... fit - lm(z ~ poly(x,2) + poly(y,2)) . So I want to repressent the surface How could I do it? Any idea?? You need to write a function f of x and y that produces the fitted values. I haven't checked, but I'd assume it needs to take vector inputs and produce a vector of responses. Perhaps worth noting that since the fit was produced by lm(), predict() will generate the surface values without a separate function. For example, if we said something like [sorry, not yet checked] xvals - seq(-35, -15, 0.5) yvals - seq(-35, -15, 0.5) new.data - data.frame(x=rep(xvals, each=length(yvals)), y=rep(yvals, length(xvals) ) new.data$z - predict(fit, newdata=new.data) #This should generate the predicted surface #Then: with(new.data, persp3d(x, y, z)) #ought to do the job. Then persp3d(f) will draw the surface. See ?persp3d.function for details on setting the x and y ranges, etc. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted skewness and curtosis
Thank you! On Thu, Jul 16, 2015 at 11:45 AM, Bert Gunter bgunter.4...@gmail.com wrote: rep() **does** do it essentially in C !! See also the moments package, which I found instantly by googling sample moments in R, though I don't know whether it does what you want (but probably shouldn't do). Of course, sample skewness and kurtosis are basically useless, but that's another, off topic, issue. Cheers, Bert Cheers, Bert Bert Gunter Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. -- Clifford Stoll On Thu, Jul 16, 2015 at 8:37 AM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Unfortunately not - more like 0.7654, 1.2345. I understand that I could multiply each number by 100, round it to no decimal point and then unroll my data in proportion. I was just hoping someone has done it in C and put it into a package... On Thu, Jul 16, 2015 at 11:27 AM, David Winsemius dwinsem...@comcast.net wrote: On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote: Is there an R package that allows one to calculate skewness and curtosis - but weighted with individual level weights (one weight per observation)? Integer weights? -- David Winsemius Alameda, CA, USA -- Dimitri Liakhovitski __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitri Liakhovitski __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted skewness and curtosis
On Jul 16, 2015, at 8:37 AM, Dimitri Liakhovitski wrote: Unfortunately not - more like 0.7654, 1.2345. I understand that I could multiply each number by 100, round it to no decimal point and then unroll my data in proportion. I was just hoping someone has done it in C and put it into a package... I saw Bert's comments but I don't think that `rep` handles fractional weights. Take a look at Hmisc::wtd.var (since I have that package loaded and see that code is easy to grasp). Also appears after after using sos::findFn(weighted moments) that the package:lmomco deserves your attention. On Thu, Jul 16, 2015 at 11:27 AM, David Winsemius dwinsem...@comcast.net wrote: On Jul 16, 2015, at 8:10 AM, Dimitri Liakhovitski wrote: Is there an R package that allows one to calculate skewness and curtosis - but weighted with individual level weights (one weight per observation)? Integer weights? -- David Winsemius David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix manipulation -solved
Yes it is obvious --- once someone else pointed it out. Thanks for the hint. Terry T. On 07/16/2015 12:52 PM, Peter Langfelder wrote: Hi Terry, maybe I'm missing something, but why not define a matrix BB = V'B; then t(B) %*% V = t(BB), then your problem reduces to finding A such that t(BB) %*% A = 0? Peter On Thu, Jul 16, 2015 at 10:28 AM, Therneau, Terry M., Ph.D. thern...@mayo.edu wrote: This is as much a mathematics as an R question, in the this should be easy but I don't see it category. Assume I have a full rank p by p matrix V (aside: V = (X'X)^{-1} for a particular setup), a p by k matrix B, and I want to complete an orthagonal basis for the space with distance function V. That is, find A such that t(B) %*% V %*% A =0, where A has p rows and p-k columns. With V=identity this is easy. I can do it in 1-2 lines using qr(), lm(), or several other tools. A part of me is quite certain that the general problem isn't more than 3 lines of R, but after a day of beating my head on the issue I still don't see it. Math wise it looks like a simple homework problem in a mid level class, but I'm not currently sure that I'd pass said class. If someone could show the way I would be grateful. Either that or assurance that the problem actually IS hard and I'm not as dense as I think. Terry T. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] User defined function within a formula
Read about the 'makepredictcall' generic function. There is a method, makepredictcall.poly(), for poly() that attaches the polynomial coefficients used during the fitting procedure to the call to poly() that predict() makes. You ought to supply a similar method for your xpoly(), and xpoly() needs to return an object of a a new class that will cause that method to be used. E.g., xpoly - function(x,degree=1,...){ ret - poly(x,degree=degree,...); class(ret) - xpoly ; ret } makepredictcall.xpoly - function (var, call) { if (is.call(call)) { if (identical(call[[1]], quote(xpoly))) { if (!is.null(tmp - attr(var, coefs))) { call$coefs - tmp } } } call } g2 - glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma) predict(g2,dc) # 1 2 3 4 5 #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608 #-0.01398928608 # 6 7 8 9 #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608 You can see the effects of makepredictcall() in the 'terms' component of glm's output. The 'variables' attribute of it gives the original function calls and the 'predvars' attribute gives the calls to be used for prediction: attr(g2$terms, variables) list(lot1, log(u), xpoly(u, 1)) attr(g2$terms, predvars) list(lot1, log(u), xpoly(u, 1, coefs = list(alpha = 40, norm2 = c(1, 9, 8850 Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jul 16, 2015 at 12:35 PM, Kunshan Yin yinkuns...@gmail.com wrote: Hello, I have a question about the formula and the user defined function: I can do following: ###Case 1: clotting - data.frame( + u = c(5,10,15,20,30,40,60,80,100), + lot1 = c(118,58,42,35,27,25,21,19,18), + lot2 = c(69,35,26,21,18,16,13,12,12)) g1=glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma) dc=clotting dc$u=1 predict(g1,dc) 1 2 3 4 5 6 7 8 9 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 However, if I just simply wrap the poly as a user defined function ( in reality I would have my own more complex function) then I will get error: ###Case 2: xpoly-function(x,degree=1){poly(x,degree)} g2=glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma) predict(g2,dc) Error in poly(x, degree) : 'degree' must be less than number of unique points It seems that the predict always treat the user defined function in the formula with I(). My question is how can I get the results for Case2 same as case1? Anyone can have any idea about this? Thank you very much. Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] User defined function within a formula
Hello, I have a question about the formula and the user defined function: I can do following: ###Case 1: clotting - data.frame( + u = c(5,10,15,20,30,40,60,80,100), + lot1 = c(118,58,42,35,27,25,21,19,18), + lot2 = c(69,35,26,21,18,16,13,12,12)) g1=glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma) dc=clotting dc$u=1 predict(g1,dc) 1 2 3 4 5 6 7 8 9 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 However, if I just simply wrap the poly as a user defined function ( in reality I would have my own more complex function) then I will get error: ###Case 2: xpoly-function(x,degree=1){poly(x,degree)} g2=glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma) predict(g2,dc) Error in poly(x, degree) : 'degree' must be less than number of unique points It seems that the predict always treat the user defined function in the formula with I(). My question is how can I get the results for Case2 same as case1? Anyone can have any idea about this? Thank you very much. Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adjusting Probability for Oversampling
I am developing a marketing (Churn) model that has an event rate of 0.5%. So i thought to perform oversampling. I mean making the number of events equal to number of non-events by reducing non-events (50-50 after sampling). After oversampling, we need to adjust predicted probabilities as it inflates intercept. I always do it in logistic regression. Does the same adjustment require for decision tree, random forest or other ensemble techniques? I am following Applied Predictive Modeling with R (Caret Package). The author has used the same technique (down sampling) but he did not adjust predicted probability when he score validation and test data. Any help would be highly appreciated! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching strings in a list
On Thu, Jul 16, 2015 at 1:00 PM, John McKown john.archie.mck...@gmail.com wrote: On Thu, Jul 16, 2015 at 12:40 PM, tryingtolearn inshi...@ymail.com wrote: Say I have a list: [[1]] I like google [[2]] Hi Google google [[3]] what's up and they are tweets. And I want to find out how many tweets mention google (the answer should be 2). If I string split and unlist them, then I would get the answer of 3. How do I make sure I get just 2? NROW(grep(google,list,ignore.case=TRUE)) Or sum(grepl(google,x,ignore.case=TRUE)) -- Schrodinger's backup: The condition of any backup is unknown until a restore is attempted. Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be. He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! John McKown [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speeding up code?
Thank Jim! This makes a huge difference. Can you explain why are data frame slower than a matrix? Any other suggestions on how to improve the code would be greatly appreciated. Thanks again! Ignacio On Thu, Jul 16, 2015 at 1:42 PM jim holtman jholt...@gmail.com wrote: Actually looking at the result, you don't need the transpose; that was an artifact of how you were doing it before. xm - do.call(rbind, str_split(string = AllpairsTmp, pattern = -)) # convert to dataframe and do transpose on matrix and not dataframe separoPairs - as.data.frame((xm), stringsAsFactors = FALSE) Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Jul 16, 2015 at 1:37 PM, jim holtman jholt...@gmail.com wrote: Here is one improvement. Avoid dataframes in some of these cases. This create a character matrix and then converts to a dataframe after doing the transpose of the matrix. This just takes less than 10 seconds on my system: library(stringr) # create character matrix; avoid dataframes in this case print(proc.time()) user system elapsed 15.525.24 587.70 xm - do.call(rbind, str_split(string = AllpairsTmp, pattern = -)) # convert to dataframe and do transpose on matrix and not dataframe separoPairs - as.data.frame(t(xm), stringsAsFactors = FALSE) print(proc.time() + + ) user system elapsed 20.905.36 596.57 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Jul 16, 2015 at 7:56 AM, Ignacio Martinez ignaci...@gmail.com wrote: Hi Collin, The objective of the gen.names function is to generate N *unique *random names, where N is a *large *number. In my computer `gen.names(n = 5)` takes under a second, so is probably not the root problem in my code. That said, I would love to improve it. I'm not exactly sure how you propose to change it using sample. What is the object that I would be sampling? I would love to run a little benchmark to compare my version with yours. What really takes a long time to run is: separoPairs - as.data.frame(str_split(string = AllpairsTmp, pattern = -)) So that and the chunk of code before that is probably where I would get big gains in speed. Sadly, I have no clue how to do it differently Thanks a lot for the help!! Ignacio On Wed, Jul 15, 2015 at 11:34 PM Collin Lynch cfly...@ncsu.edu wrote: Hi Ignacio, If I am reading your code correctly then the top while loop is essentially seeking to select a random set of names from the original set, then using unique to reduce it down, you then iterate until you have built your quota. Ultimately this results in a very inefficient attempt at sampling without replacement. Why not just sample without replacement rather than loop iteratively and use unique? Or if the set of possible names are short enough why not just randomize it and then pull the first n items off? Best, Collin. On Wed, Jul 15, 2015 at 11:15 PM, Ignacio Martinez ignaci...@gmail.com wrote: Hi R-Help! I'm hoping that some of you may give me some tips that could make my code more efficient. More precisely, I would like to make the answer to my stakoverflow http://stackoverflow.com/questions/31137940/randomly-assign-teachers-to-classrooms-imposing-restrictions question more efficient. This is the code: library(dplyr) library(randomNames) library(geosphere) set.seed(7142015)# Define Parameters n.Schools - 20 first.grade-3 last.grade-5 n.Grades -last.grade-first.grade+1 n.Classrooms - 20 # THIS IS WHAT I WANTED TO BE ABLE TO CHANGE n.Teachers - (n.Schools*n.Grades*n.Classrooms)/2 #Two classrooms per teacher # Define Random names function: gen.names - function(n, which.names = both, name.order = last.first){ names - unique(randomNames(n=n, which.names = which.names, name.order = name.order)) need - n - length(names) while(need0){ names - unique(c(randomNames(n=need, which.names = which.names, name.order = name.order), names)) need - n - length(names) } return(names)} # Generate n.Schools names gen.schools - function(n.schools) { School.ID - paste0(gen.names(n = n.schools, which.names = last), ' School') School.long - rnorm(n = n.schools, mean = 21.7672, sd = 0.025) School.lat - rnorm(n = n.schools, mean = 58.8471, sd = 0.025) School.RE - rnorm(n = n.schools, mean = 0, sd = 1) Schools - data.frame(School.ID, School.lat, School.long, School.RE) %% mutate(School.ID = as.character(School.ID)) %% rowwise() %% mutate (School.distance = distHaversine( p1 = c(School.long, School.lat), p2 = c(21.7672, 58.8471), r = 3961 )) return(Schools)} Schools - gen.schools(n.schools = n.Schools) #
Re: [R] (ordinal) logistic regression
Hi your example is not reproducible. With ordinal regression the type of the y values is important sometimes an ordered factor is required. Ordinal regression depends on your hypothesis see Ananth and Kleinbaum 1997 functions/packages to look at apart from ordinal VGAM polr::MASS bayespolr::arm lrm::rms if you want to do GEE that is another matter. Regards Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England Armidale NSW 2351 Email: home: mac...@northnet.com.au -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Wolfgang Raffelsberger Sent: Friday, 17 July 2015 03:41 To: r-help@r-project.org Subject: [R] (ordinal) logistic regression Dear list, I've been looking at previous posts on the list, but I haven't found any close enough to my question/problem. My data can be seen as a matrix of mutiple individuals (columns) with (rather independent) measures (lines). Now based on supplemental information, the individuals are organized in (multiple) ordered classes. Now I would like to test for which type of measure (ie which line form my matrix of data) the groups are distinct (eg different by group-mean). In other words, I would like to see in which line of my input matrix the measures for the groups of individuals associate to distinct group-values. So I tried glm (or glm2) on each line of the matrix, but in my toy-example (shown below) I'm surprised to get warnings about not finding convergence in the nice toy-cases (ie distinct groups as I am looking for),e even with glm2 ! I see in such nice cases with glm() the Pr(|z|) is close to 1, which in first sight is OK (since: H0 : coefficient =0), but I suppose the test is not really set up right this way. When trying lrm (rms package) I even get an error message (Unable to fit model using “lrm.fit”) with the nice cases. In my real data with 4000 lines of data (ie 4000 glm tests) multiple testing correction would transform everything from 1-p to end up at p=1, so that’s another problem with this approach. I suppose somehow I should transform my data (I don't see where I would change the H0 ?) to obtain low and NOT high p-values (example below) in the case I'm looking for, ie when group-means are distinct. Any suggestions ? Thank’s in advance, Wolfgang Here my toy-example : datB1 - c(12,14:16,18:21,20:22,20,22:24,19.5) # fit partially/overlapping to 3grp model datB2 - c(11:15,21:25,31:36)# too beautiful to be real ... datB3 - c(15:12,11:15,12:14,15:12) # no fit to 3grp model datB4 - c(11:15,15:11,21:26)# grpA == grpB but grpA != grpC datB - rbind(datB1,datB2,datB3,datB4) set.seed(2015) datB - datB + round(runif(length(datB),-0.3,0.3),1) # add some noise datB - datB - rowMeans(datB) # centering ## here the definition of the groups grpB - gl(3,6,labels=LETTERS[1:3])[c(-6,-7)] table(grpB) ## display layout(1:4) for(i in 1:4) plot(datB[i,],as.numeric(grpB)) ## now the 'test' glmLi - function(dat,grp) { ## run glm : predict grp based on dat dat - data.frame(dat=dat,grp=grp) glm(grp ~ dat, data=dat, family=binomial)} logitB - apply(datB,1,glmLi,grpB) lapply(logitB,summary) sapply(logitB,function(x) summary(x)$coefficients[,4]) # lines 1 2 are designed to be 'positive' but give high p-values with convergence problem ## for completness sessionInfo() R version 3.2.0 (2015-04-16) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 [4] LC_NUMERIC=C LC_TIME=French_France.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] glm2_1.1.2 MASS_7.3-40 TinnRcom_1.0.18 formatR_1.2 svSocket_0.9-57 loaded via a namespace (and not attached): [1] tools_3.2.0 svMisc_0.9-70 tcltk_3.2.0 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [FORGED] Re: Weighted skewness and curtosis
On 17/07/15 03:45, Bert Gunter wrote: SNIP Of course, sample skewness and kurtosis are basically useless, but that's another, off topic, issue. Fortune nomination! cheers, Rolf -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] powerTransform warning message?
Hello, I have a series of 40 variables that I am trying to transform via the boxcox method using the powerTransfrom function in R. I have no zero values in any of my variables. When I run the powerTransform function on the full data set I get the following warning. Warning message: In sqrt(diag(solve(res$hessian))) : NaNs produced However, when I analyze the variables in groups, rather than all 40 at a time I do not get this warning message. Why would this be? And does this mean this warning is safe to ignore? I would like to add that all of my lambda values are in the -5 to 5 range. I also get different lambda values when I analyze the variables together versus in groups. Is this to be expected? Thank you so much! Brittany [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] User defined function within a formula
OPoly-function(x,degree=1,weight=1){ weight=round(weight,0)# weight need to be integer if(length(weight)!=length(x))weight=rep(1,length(x)) p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree) Z-(t(t(p[cumsum(weight),])*sqrt(attr(p,coefs)$norm2[- seq(2)]))[,degree]) class(Z)-OPoly;Z } You need to make OPoly to have optional argument(s) that give the original-regressor-dependent information to OPoly and then have it return, as attributes, the value of those arguments. makepredictcall will take the attributes and attach them to the call in predvars so predict uses values derived from the original regressors, not value derived from the data to be predicted from. Take a look at a pair like makepredictcall.scale() and scale() for an example: scale has optional arguments 'center' and 'scale' that it returns as attributes and makepredictcall.scale adds those to the call to scale that it is given. Thus when you predict, the scale and center arguments come from the original data, not from the data you are predicting from. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jul 16, 2015 at 3:43 PM, Kunshan Yin yinkuns...@gmail.com wrote: Thanks Bill for your quick reply. I tried your solution and it did work for the simple user defined function xploly. But when I try with other function, it gave me error again: OPoly-function(x,degree=1,weight=1){ weight=round(weight,0)# weight need to be integer if(length(weight)!=length(x))weight=rep(1,length(x)) p=poly(4*(rep(x,weight)-mean(range(x)))/diff(range(x)),degree) Z-(t(t(p[cumsum(weight),])*sqrt(attr(p,coefs)$norm2[-seq(2)]))[,degree]) class(Z)-OPoly;Z } ##this OPoly is an FORTRAN orthogonal polynomial routine, it first maps the x to range[-2,2] then do QR, then return the results with sqrt(norm2). Comparing with poly, this transformation will make the model coefficients within a similar range as other variables, the R poly routine will usually give you a very large coefficients. I did not find such routine in R, so I have to define this as user defined function. ### I also have following function as you suggested: makepredictcall.OPoly-function(var,call) { if (is.call(call)) { if (identical(call[[1]], quote(OPoly))) { if (!is.null(tmp - attr(var, coefs))) { call$coefs - tmp } } } call } But I still got error for following: g3=glm(lot1 ~ log(u) + OPoly(u,1), data = clotting, family = Gamma) predict(g3,dc)Error in poly(4 * (rep(x, weight) - mean(range(x)))/diff(range(x)), degree) : missing values are not allowed in 'poly' I thought it might be due to the /diff(range(x) in the function. But even I remove that part, it will still give me error. Any idea? Many thanks in advance. Alex On Thu, Jul 16, 2015 at 2:09 PM, William Dunlap wdun...@tibco.com wrote: Read about the 'makepredictcall' generic function. There is a method, makepredictcall.poly(), for poly() that attaches the polynomial coefficients used during the fitting procedure to the call to poly() that predict() makes. You ought to supply a similar method for your xpoly(), and xpoly() needs to return an object of a a new class that will cause that method to be used. E.g., xpoly - function(x,degree=1,...){ ret - poly(x,degree=degree,...); class(ret) - xpoly ; ret } makepredictcall.xpoly - function (var, call) { if (is.call(call)) { if (identical(call[[1]], quote(xpoly))) { if (!is.null(tmp - attr(var, coefs))) { call$coefs - tmp } } } call } g2 - glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma) predict(g2,dc) # 1 2 3 4 5 #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608 #-0.01398928608 # 6 7 8 9 #-0.01398928608 -0.01398928608 -0.01398928608 -0.01398928608 You can see the effects of makepredictcall() in the 'terms' component of glm's output. The 'variables' attribute of it gives the original function calls and the 'predvars' attribute gives the calls to be used for prediction: attr(g2$terms, variables) list(lot1, log(u), xpoly(u, 1)) attr(g2$terms, predvars) list(lot1, log(u), xpoly(u, 1, coefs = list(alpha = 40, norm2 = c(1, 9, 8850 Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jul 16, 2015 at 12:35 PM, Kunshan Yin yinkuns...@gmail.com wrote: Hello, I have a question about the formula and the user defined function: I can do following: ###Case 1: clotting - data.frame( + u = c(5,10,15,20,30,40,60,80,100), + lot1 = c(118,58,42,35,27,25,21,19,18), + lot2 = c(69,35,26,21,18,16,13,12,12)) g1=glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma) dc=clotting dc$u=1 predict(g1,dc) 1 2 3 4 5 6 7