Re: [R] How to add a variable to a dataframe whose values are conditional upon the values of an existing variable
You could also try a series of simple ifelse statements. I just tried the following and got it to work, though I am sure there is a faster way. t=c("cow", "dog", "chick") y=c(1,3,4) mat=cbind(t,y) mat=as.data.frame(mat) > mat t y 1 cow 1 2 dog 3 3 chick 4 mat$g=ifelse(mat$t=="cow", 1, 6) mat$g=ifelse(mat$t=="dog", 2, mat$g) mat$g=ifelse(mat$t=="chick", 3, mat$g) > mat t y g 1 cow 1 1 2 dog 3 2 3 chick 4 3 To days of the week would only be 7 statements. Andrew Miles Department of Sociology Duke University On Feb 26, 2010, at 2:31 PM, Steve Matco wrote: Hi everyone, I am at my wits end with what I believe would be considered simple by a more experienced R user. I want to know how to add a variable to a dataframe whose values are conditional on the values of an existing variable. I can't seem to make an ifelse statement work for my situation. The existing variable in my dataframe is a character variable named DOW which contains abbreviated day names (SAT, SUN, MON.FRI). I want to add a numerical variable named DOW1 to my dataframe that will take on the value 1 if DOW equals "SAT", 2 if DOW equals "SUN", 3 if DOW equals "MON",.,7 if DOW equals "FRI". I know this must be a simple problem but I have searched everywhere and tried everything I could think of. Any help would be greatly appreciated. Thank you, Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Interaction terms in logistic regression using glm
I recently became aware of the article by Ai and Norton (2003) about how interaction terms are problematic in nonlinear regression (such as logistic regression). They offer a correct way of estimating interaction effects and their standard errors. My question is: Does the glm() function take these corrections into account when estimating interaction terms for a logistic regression (i.e. when family=binomial)? If not, is there a function somewhere that allows for correct estimation? I've looked the documentation for glm and couldn't find an answer, nor have I seen the issue addressed in the forums or in the examples of logistic regression in R that I've found online. Thanks! Andrew Miles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interaction terms in logistic regression using glm
Thanks for this, Kjetil. I see that I was getting interpretation and estimation confused. My apologies on not including the paper title. For the benefit of those who read this post after me, it is: Ai, Chunrong, and Edward C. Norton. 2003. "Interaction Terms in Logit and Probit Models." Economic Letters 80:123-129. You mentioned that you found a number of papers not supporting their conclusions. I did a google search and found one paper amending Ai and Norton's results: "The Treatment Effect, the Cross Difference, and the Interaction Term in Nonlinear Difference-in-Differences Models" by Patrick A. Puhani. Did you find others? If you wouldn't mind sending along the citations for what you found, that would be very helpful. Many thanks! Andrew Miles On Apr 29, 2010, at 10:45 PM, Kjetil Halvorsen wrote: > see comments below. > > On Wed, Apr 28, 2010 at 4:29 PM, Andrew Miles > wrote: >> I recently became aware of the article by Ai and Norton (2003) >> about how >> interaction terms are problematic in nonlinear regression (such as >> logistic >> regression). They offer a correct way of estimating interaction >> effects and >> their standard errors. >> >> My question is: Does the glm() function take these corrections >> into account >> when estimating interaction terms for a logistic regression (i.e. >> when >> family=binomial)? > > No. > > If not, is there a function somewhere that allows for >> correct estimation? > > The estimation you get from glm is correct. The discussion in the > paper you referred > is about how to interpret the estimation results! A google search on > the referred paper > (you did'nt give the title), show up various later papers referring to > it, and not supporting their > conclusions. > > Linear (and non-linear) model books badly needs chapters with titles > such as "post-estimation analysis". glm does the estimation for you. > It cannot do the analysis for you! > > Probably you are looking for something such as CRAN package "effects". > > Kjetil > > > >> >> I've looked the documentation for glm and couldn't find an answer, >> nor have >> I seen the issue addressed in the forums or in the examples of >> logistic >> regression in R that I've found online. >> >> Thanks! >> >> Andrew Miles >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Weighted descriptives by levels of another variables
I've noticed that R has a number of very useful functions for obtaining descriptive statistics on groups of variables, including summary {stats}, describe {Hmisc}, and describe {psych}, but none that I have found is able to provided weighted descriptives of subsets of a data set (ex. descriptives for both males and females for age, where accurate results require use of sampling weights). Does anybody know of a function that does this? What I've looked at already: I have looked at describe.by {psych} which will give descriptives by levels of another variable (eg. mean ages of males and females), but does not accept sample weights. I have also looked at describe {Hmisc} which allows for weights, but has no functionality for subdivision. I tried using a by() function with describe{Hmisc}: by(cbind(my, variables, here), division.variable, describe, weights=weight.variable) but found that this returns an error message stating that the variables to be described and the weights variable are not the same length: Error in describe.vector(xx, nam[i], exclude.missing = exclude.missing, : length of weights must equal length of x In addition: Warning message: In present & !is.na(weights) : longer object length is not a multiple of shorter object length This comes because the by() function passes down a subset of the variables to be described to describe(), but not a subset of the weights variable. describe() then searches the whatever data set is attached in order to find the weights variables, but this is in its original (i.e. not subsetted) form. Here is an example using the ChickWeight dataset that comes in the "datasets" package. data(ChickWeight) attach(ChickWeight) library(Hmisc) #this gives descriptive data on the variables "Time" and "Chick" by levels of "Diet") by(cbind(Time, Chick), Diet, describe) #trying to add weights, however, does not work for reasons described above wgt=rnorm(length(Chick), 12, 1) by(cbind(Time, Chick), Diet, describe, weights=wgt) Again, my question is, does anybody know of a function that combines both the ability to provided weighted descriptives with the ability to subdivide by the levels of some other variable? Andrew Miles Department of Sociology Duke University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted descriptives by levels of another variables
Thanks! Using the plyr package and the approach you outlined seems to work well for relatively simple functions (like wtd.mean), but so far I haven't had much success in using it with more complex descriptive functions like describe {Hmisc}. I'll take a look later, though, and see if I can figure out why. At any rate, ddply() looks like it will simplify writing a function that will allow for weighting data and subdividing it, but still give comprehensive summary statistics (i.e. not just the mean or quantiles, but all in one). I'll post it to the list once I have the time to write it up. I also took a stab at using the svyby funtion in the survey package, but received the following error message when I input : > svyby(cbind(educ, age), female, svynlsy, svymean) Error in `[.survey.design2`(design, byfactor %in% byfactor[i], ) : (subscript) logical subscript too long __ In addition to using the survey package (and the svyby function), I've found that many of the 'weighted' functions, such as wtd.mean, work well with the plyr package. For example, wtdmean=function(df)wtd.mean(df$obese,df$sampwt); ddply(mydata, ~cut2(age,c(2,6,12,16)),'wtdmean') hth, david freedman Andrew Miles-2 wrote: I've noticed that R has a number of very useful functions for obtaining descriptive statistics on groups of variables, including summary {stats}, describe {Hmisc}, and describe {psych}, but none that I have found is able to provided weighted descriptives of subsets of a data set (ex. descriptives for both males and females for age, where accurate results require use of sampling weights). Does anybody know of a function that does this? What I've looked at already: I have looked at describe.by {psych} which will give descriptives by levels of another variable (eg. mean ages of males and females), but does not accept sample weights. I have also looked at describe {Hmisc} which allows for weights, but has no functionality for subdivision. I tried using a by() function with describe{Hmisc}: by(cbind(my, variables, here), division.variable, describe, weights=weight.variable) but found that this returns an error message stating that the variables to be described and the weights variable are not the same length: Error in describe.vector(xx, nam[i], exclude.missing = exclude.missing, : length of weights must equal length of x In addition: Warning message: In present & !is.na(weights) : longer object length is not a multiple of shorter object length This comes because the by() function passes down a subset of the variables to be described to describe(), but not a subset of the weights variable. describe() then searches the whatever data set is attached in order to find the weights variables, but this is in its original (i.e. not subsetted) form. Here is an example using the ChickWeight dataset that comes in the "datasets" package. data(ChickWeight) attach(ChickWeight) library(Hmisc) #this gives descriptive data on the variables "Time" and "Chick" by levels of "Diet") by(cbind(Time, Chick), Diet, describe) #trying to add weights, however, does not work for reasons described above wgt=rnorm(length(Chick), 12, 1) by(cbind(Time, Chick), Diet, describe, weights=wgt) Again, my question is, does anybody know of a function that combines both the ability to provided weighted descriptives with the ability to subdivide by the levels of some other variable? Andrew Miles Department of Sociology Duke University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error message using mi() in mi package
Hello! I get the following message when I run the mi() function from the mi package. Error while imputing variable: c3 , model: mi.polr Error in eval(expr, envir, enclos) : could not find function "c14ordered" Here's the situation: I am running R v. 2.9.2 on Mac OSX v. 10.5.8. I am trying to impute missing data in a data set that I've trimmed down to 302 variables. I've created an mi.info() object on the data, and I've updated the "type" of variable where necessary so that the mi() imputing function uses the most appropriate type of models to estimate imputed values. The data have no collinearlity. I then run the mi function like this: imp = mi(imp.data, info=info2, n.iter=10) where imp.data is my data set of 302 variables and info2 is my modified mi.info object. I get the message as posted above. The message only occurs when working on a variable I have labeled as "ordered-categorical." But the mi() function processes most variables labeled as "ordered-categorical" just fine. In fact, if shrink my data set (say, to just 5 variables) I can get mi() to process a problematic variable just fine as well. I'm not sure what the function "c14ordered" is that the error message refers to. My first thought is maybe it is referring to one of my variables in my data? Variables names in my data follow a basic letter-number pattern (i.e. a1, a2, etc.), but there is no c14, rather c14a1, c14a2, etc. So I'm not sure if the variable has anything to do with the problem, but I thought I'd mention it just in case someone wiser in this matter than I can see something I cannot. I cannot post code that reproduces the problem due to the nature of the code and data involved. Any help would be appreciated, as I am not sure what is happening, and can't see why I can sometimes impute a variable labeled as "ordered- categorical" and sometimes cannot. Thanks! Andrew Miles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error message using mi() in mi package
On Jul 6, 2010, at 1:30 PM, Peter Ehlers wrote: On 2010-07-06 10:37, Andrew Miles wrote: Hello! I get the following message when I run the mi() function from the mi package. Error while imputing variable: c3 , model: mi.polr Error in eval(expr, envir, enclos) : could not find function "c14ordered" Here's the situation: I am running R v. 2.9.2 on Mac OSX v. 10.5.8. I am trying to impute missing data in a data set that I've trimmed down to 302 variables. I've created an mi.info() object on the data, and I've updated the "type" of variable where necessary so that the mi() imputing function uses the most appropriate type of models to estimate imputed values. The data have no collinearlity. I then run the mi function like this: imp = mi(imp.data, info=info2, n.iter=10) where imp.data is my data set of 302 variables and info2 is my modified mi.info object. I get the message as posted above. The message only occurs when working on a variable I have labeled as "ordered-categorical." But the mi() function processes most variables labeled as "ordered-categorical" just fine. In fact, if shrink my data set (say, to just 5 variables) I can get mi() to process a problematic variable just fine as well. I'm not sure what the function "c14ordered" is that the error message refers to. My first thought is maybe it is referring to one of my variables in my data? Variables names in my data follow a basic letter-number pattern (i.e. a1, a2, etc.), but there is no c14, rather c14a1, c14a2, etc. So I'm not sure if the variable has anything to do with the problem, but I thought I'd mention it just in case someone wiser in this matter than I can see something I cannot. I cannot post code that reproduces the problem due to the nature of the code and data involved. Any help would be appreciated, as I am not sure what is happening, and can't see why I can sometimes impute a variable labeled as "ordered-categorical" and sometimes cannot. Thanks! Andrew Miles This looks suspiciously like a syntax problem. I would get my text editor to search for 'c14ordered' in the code. You might have missed some punctuation. -Peter Ehlers A good thought. I checked my own code (the stuff coding the data and using the mi package) and, for good measure, the code of the mi.polr function in the mi package, but the string "c14ordered" never appears. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error message using mi() in mi package
On Jul 6, 2010, at 2:15 PM, Erik Iverson wrote: This looks suspiciously like a syntax problem. I would get my text editor to search for 'c14ordered' in the code. You might have missed some punctuation. -Peter Ehlers A good thought. I checked my own code (the stuff coding the data and using the mi package) and, for good measure, the code of the mi.polr function in the mi package, but the string "c14ordered" never appears. And what does >traceback() tell you? Here is the original error message: Error while imputing variable: a8 , model: mi.polr Error in eval(expr, envir, enclos) : could not find function "c14ordered" I confess I'm not good at reading traceback() output, but here is what it says: 18: eval(expr, envir, enclos) 17: eval(predvars, data, env) 16: model.frame.default(formula = form, data = data, drop.unused.levels = FALSE) 15: model.frame(formula = form, data = data, drop.unused.levels = FALSE) 14: model.frame(formula = form, data = data, drop.unused.levels = FALSE) 13: eval(expr, envir, enclos) 12: eval(expr, p) 11: eval.parent(m) 10: bayespolr(formula = form, data = data, start = 0, method = c("logistic"), drop.unused.levels = FALSE, n.iter = n.iter) 9: mi.polr(formula = "a8 ~ rural_now + a3 + a7 . . . [more arguments here including the rest of the variables in the data set and a list of all the actual data], start = NULL, n.iter = 100, draw.from.beta = TRUE) 8: do.call(model.type, args = c(list(formula = info[[CurrentVar]] $imp.formula, data = dat, start = if (!is.null(start.val[[i]][[jj]])) { start.val[[i]][[jj]] } else { NULL }), info[[CurrentVar]]$params)) 7: eval(expr, envir, enclos) 6: eval(substitute(expr), data, enclos = parent.frame()) 5: with.default(data = dat, do.call(model.type, args = c(list(formula = info[[CurrentVar]]$imp.formula, data = dat, start = if (!is.null(start.val[[i]][[jj]])) { start.val[[i]][[jj]] } else { NULL }), info[[CurrentVar]]$params))) 4: with(data = dat, do.call(model.type, args = c(list(formula = info[[CurrentVar]]$imp.formula, data = dat, start = if (!is.null(start.val[[i]][[jj]])) { start.val[[i]][[jj]] } else { NULL }), info[[CurrentVar]]$params))) 3: .local(object, ...) 2: mi(imp.data, info = info2, n.iter = 10) 1: mi(imp.data, info = info2, n.iter = 10) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to calculate summary statistics for each factor
You could try Summarize in the NCStats package, or aggregate in the epicalc package. Andrew Miles Department of Sociology Duke University On Jul 6, 2010, at 11:53 AM, karena wrote: I have a dataset like the following: subject class value 123110 1241 12 125112 223223 224 2 18 225 219 3233 21 324 3 10 325 3 19 326 3 20 how to calculate the summary value for each factor? thanks karena -- View this message in context: http://r.789695.n4.nabble.com/how-to-calculate-summary-statistics-for-each-factor-tp2279777p2279777.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to plot two histograms overlapped in the same plane coordinate
I'm not sure what you are trying to do. Do you want one histogram for males and one for females on the same graph? If so, the simplest way to put two histograms together is to simply use the add parameter: age.males=age[which(sex=="M")] age.females=age[which(sex=="F")] hist(age.males, col="blue") hist(age.females, add=T) The only problem is that the hist() function does not do semi- transparency. I am not sure if other packages do. The code above will give you a blue histogram for males, and clear histogram for females on top of it. You'll probably have to manually alter the axes of the histogram to give the histograms for males and females the same break points (i.e. where one bar stops and another begins). See ?hist for more information about that. Andrew Miles Department of Sociology Duke University On Jul 9, 2010, at 9:29 AM, Mao Jianfeng wrote: Dear R-help listers, I am new. I just want to get helps on how to plot two histograms overlapped in the same plane coordinate. What I did is very ugly. Could you please help me to improve it? I want to got a plot with semi- transparent overlapping region. And, I want to know how to specify the filled colors of the different histograms. I also prefer other solutions other than ggplot2. Many thanks to you. What I have done: library(ggplot2) age<-c(rnorm(100, 1.5, 1), rnorm(100, 5, 1)) sex<-c(rep("F",100), rep("M", 100)) mydata<-cbind(age, sex) mydata<-as.data.frame(mydata) head(mydata) qplot(age, data=mydata, geom="histogram", fill=sex, xlab="age", ylab="count", alpha=I(0.5)) Best, Mao J-F __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Changing model parameters in the mi package
I am trying to use the mi package to impute data, but am running into problems with the functions it calls. For instance, I am trying to impute a categorical variable called "min.func." The mi() function calls the mi.categorical() function to deal with this variable, which in turn calls the nnet.default() function, and passes it a fixed parameter MaxNWts=1500. However, as you can see below, the min.func variable needs a higher threshold than 1500. Is there a way that I can change the MaxNWts parameter that is being sent to nnet.default()? I've investigated the mi() and mi.info() functions and cannot see a way. Thanks for your help! Error message and system info below: Beginning Multiple Imputation ( Wed Jul 14 10:25:06 2010 ): Iteration 1 Imputation 1 : min.func* Error while imputing variable: min.func , model: mi.categorical Error in nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE, softmax = TRUE, : too many (2608) weights System is Mac OS X 10.5.8, R version 2.9.2 Andrew Miles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changing model parameters in the mi package
I figured this one out. I'm sending the solution back to the help list so that there is a record of how to deal with this problem, since it does not seem to be documented elsewhere. The mi.info object that you pass to the the mi() function contains a value called "params" which allows you to set the parameters used in each imputation model. If you access it like so info=mi.info(impute.data) info[["min.func"]]$params you are presented with a matrix showing a value for only two parameters that are passed to every model, n.iter and draw.from.beta. You can add any other parameters that you want like so: info[["min.func"]]$params$MaxNWts=3000 This will add the parameter MaxNWts to the min.func model. This seems to have solved the problem I outlined below. Andrew Miles On Jul 14, 2010, at 10:33 AM, Andrew Miles wrote: I am trying to use the mi package to impute data, but am running into problems with the functions it calls. For instance, I am trying to impute a categorical variable called "min.func." The mi() function calls the mi.categorical() function to deal with this variable, which in turn calls the nnet.default() function, and passes it a fixed parameter MaxNWts=1500. However, as you can see below, the min.func variable needs a higher threshold than 1500. Is there a way that I can change the MaxNWts parameter that is being sent to nnet.default()? I've investigated the mi() and mi.info() functions and cannot see a way. Thanks for your help! Error message and system info below: Beginning Multiple Imputation ( Wed Jul 14 10:25:06 2010 ): Iteration 1 Imputation 1 : min.func* Error while imputing variable: min.func , model: mi.categorical Error in nnet.default(X, Y, w, mask = mask, size = 0, skip = TRUE, softmax = TRUE, : too many (2608) weights System is Mac OS X 10.5.8, R version 2.9.2 Andrew Miles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error using the mi package
I'm trying to impute data using the mi package, but after running through almost the entire first round of imputations (which takes quite a while), it throws this error (I'll include the whole output prior to the error for context). Does anyone know what is causing it, or how I can fix it? More specifically, how can I tell what is throwing the error so I know what to fix? Is it a problem with a variable? With the formula I am using for imputations? Beginning Multiple Imputation ( Thu Jul 15 09:10:33 2010 ): Iteration 1 Imputation 1 : min.func* Tenure* Salary* Housing* ord_stat* rural_now* a3* a7* a8* a9* a10* a12* a13* a14* a15* a16* a17a* a17b* a17c* a17d* a17e* a17f* a18* a21* b1a* b1b* b1c* b1d* b1e* b1f* b1g* b1h* b1i* b3a* b3b* b3c* b3d* b3e* b3f* b3g* b4a* b4b* b4c* b4d* b4e* b4f* b4g* b4h* b4i* b4j* b4k* b4l* b4m* b4n* b4o* b4p* b5a* b5b* b5c* b5d* b5e* b5f* b5g* b5h* b5i* b5j* b5k* b5l* b5m* b5n* b5o* b6a* b6b* b6c* b6d* b6e* b6f* b6g* b6h* b6i* b6j* b6k* b6l* b7a* b7b* b7c* b7d* b7e* b7f* b7g* b7h* b7i* b7j* b7k* b9b* b10a* b10b* b13a* b13b* b13c* b14* c3* c4* c5* c12a* c12b* c12c* c12d* c12e* c12f* c19* d1* d2* d3* d6* d7a* d7b* d7c* d7d* d7e* d7f* d7g* d7h* d7i* d8a* d8b* d8c* d8d* d8e* d8f* d8g* d8h* d8i* d9a* d9b* d9c* d9d* d9e* d10a* d10b* d10c* d10d* d10e* d10f* d11* d13* d14* e1* e2* e3* e5* e6* f2* f3a* f3b* f3c* f3d* f3e* f4* f6* f7* f8* f9* f12* f14* f15* f16* f20* f21* f22* f23* f25* g2* g3* g6* g8* g9* g12* g13* g15* g17* g18* g22* g25* g26* g27* g28* g31* g34* g43* g55* g58* g59* g60* g61* g63* g65* g68a* g68b* g69* g70* g71* g91* g92* g93* g94* g95* Error in AveVar[s, i, ] <- c(avevar.mean, avevar.sd) : number of items to replace is not a multiple of replacement length And here is what traceback() gives: > traceback() 3: .local(object, ...) 2: mi(imp.data, info = info2, n.iter = 6, preprocess = FALSE) 1: mi(imp.data, info = info2, n.iter = 6, preprocess = FALSE) Andrew Miles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Standard Error for difference in predicted probabilities
Is there a way to estimate the standard error for the difference in predicted probabilities obtained from a logistic regression model? For example, this code gives the difference for the predicted probability of when x2==1 vs. when x2==0, holding x1 constant at its mean: y=rbinom(100,1,.4) x1=rnorm(100, 3, 2) x2=rbinom(100, 1, .7) mod=glm(y ~ x1 + x2, family=binomial) pred=predict(mod, newdata=data.frame(cbind(x1=rep(mean(x1), 100), x2)), type="response") diff=unique(pred)[1]-unique(pred)[2] diff I know that predict() will output SE's for each predicted value, but how do I generate a SE for the difference in those predicted values? Thanks in advance! Andrew Miles [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bptest
First load the package lmtest. Then run the bptest. library(lmtest) bptest(modelCH) You don't need to tell the function which variables are explanatory or dependent. Just give it the fitted model object, and it will sort all of that out and return the statistic. Andrew Miles Department of Sociology Duke University On Sep 24, 2010, at 9:53 AM, robm wrote: Hi I'm very new to R but have plenty of experience with statistics and other packages like SPSS, SAS etc. I have a dataset of around 20 columns and 200 rows. I'm trying to fit a very simple linear model between two variables. Having done so, I want to test the model for heteroscedasticity using the Breusch-Pagan test. Apparently this is easy in R by simply doing bptest(modelCH, data=KP) I've tried this but I'm told it cannot find function bptest. It's here where I'm struggling. I'm probably wrong but as far as I can see, bptest is part of the lm package which, as far as I know, I have installed. Irrespective of the fact I'm not sure how to tell bptest which is the dependent and explanatory variables - there's a more fundamental problem if it can't find the bptest function. I have searched the documentation - albeit briefly so if anyone could help I'd be very grateful Rob QBE Management -- View this message in context: http://r.789695.n4.nabble.com/bptest-tp2553506p2553506.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple graph in one graph window
Another option is to use the par() command before executing a plot command, like so: par(mfrow=c(2,2)) plot(...) This will put the next four plots all in the same window. Andrew Miles Department of Sociology Duke University On Sep 24, 2010, at 10:28 PM, Dennis Murphy wrote: > Which one do you want? > > library(lattice) > d <- data.frame(x = rep(1:3, 4), g = factor(rep(1:4, each = 3)), > y = c(1, 2, 3, 3, 1, 7, 4, 2, 11, 5, 5, 9)) > xyplot(y ~ x | g, data = d, pch = 16) > xyplot(y ~ x, data = d, groups = g, type = c('p', 'l'), pch = 16) > > Both provide 'multiple plots' and both are on the 'same page'. > > HTH, > Dennis > > > On Fri, Sep 24, 2010 at 2:37 PM, wrote: > >> Dear R users; >> Could you tell me how to let R create multiple graphs in a graph >> window. >> Please note that I am talking about creating multiple plots in the >> same page >> of the graph window. For example: >> g1=c(1,2,3) >> g2=c(3,1,7) >> g3=c(4,2,11) >> g4=c(5,5,9) >> ... >> all in one graph window. >> Best Regards >> Reza >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] formatting data for predict()
I'm trying to get predicted probabilities out of a regression model, but am having trouble with the "newdata" option in the predict() function. Suppose I have a model with two independent variables, like this: y=rbinom(100, 1, .3) x1=rbinom(100, 1, .5) x2=rnorm(100, 3, 2) mod=glm(y ~ x1 + x2, family=binomial) I can then get the predicted probabilities for the two values of x1, holding x2 constant at 0 like this: p2=predict(mod, type="response", newdata=as.data.frame(cbind(x1, x2=0))) unique(p2) However, I am running regressions as part of a function I wrote, which feeds in the independent variables to the regression in matrix form, like this: dat=cbind(x1, x2) mod2=glm(y ~ dat, family=binomial) The results are the same as in mod. Yet I cannot figure out how to input information into the "newdata" option of predict() in order to generate the same predicted probabilities as above. The same code as above does not work: p2a=predict(mod2, type="response", newdata=as.data.frame(cbind(x1, x2=0))) unique(p2a) Nor does creating a data frame that has the names "datx1" and "datx2," which is how the variables appear if you run a summary() on mod2. Looking at the model matrix of mod2 shows that the fitted model only shows two variables, the dependent variable y and one independent variable called "dat." It is as if my two variables x1 and x2 have become two levels in a factor variable called "dat." names(mod2$model) My question is this: if I have a fitted model like mod2, how do I use the "newdata" option in the predict function so that I can get the predicted values I am after? I.E. how do I recreate a data frame with one variable called "dat" that contains two levels which represent my (modified) variables x1 and x2? Thanks in advance! Andrew Miles Department of Sociology Duke University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sample size estimation for non-inferiority log-rank and Wilcoxon rank-sum tests
I haven't done much with the type of data you're working with, but here is a post that lists a few packages for doing sample size calculations in R. Perhaps one of them will be helpful. https://stat.ethz.ch/pipermail/r-help/2008-February/154223.html Andrew Miles On Sep 27, 2010, at 2:09 PM, Paul Miller wrote: > Hello Everyone, > > I'm trying to conduct a couple of power analyses and was hoping > someone might be able to help. I want to estimate the sample size > that would be necessary to adequately power a couple of non- > inferiority tests. The first would be a log-rank test and the second > would be a Wilcoxon rank-sum test. I want to be able to determine > the sample size that would be necessary to test for a 3 day > difference in median recovery time between 2 groups of cancer > patients. > > Both of these tests are infeasible using SAS Proc Power and I > haven't been able to find information about how to do them using > either SAS or R. > > Does anyone know how to perform either of these calculations? If so, > I'd greatly appreciate it if you could share a couple of examples. > > > Thanks, > > Paul > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] statistic test plot
If you type: class(z.ex) you'll see that your z-test produces an object with a class "htest." Then type: methods("plot") you'll get a list of all the types of objects that the plot() function knows how to make plots for. It doesn't look like plot has a sub- method for an htest object, so it probably doesn't know how to handle the type of data you are putting into it. Andrew Miles Department of Sociology Duke University On Sep 27, 2010, at 11:03 PM, Kie Kyon Huang wrote: Hi, I am a beginner in R and is trying to plot some dot plot from t test result. I was following the example ## Example from z.test -- requires TeachingDemos package # library(TeachingDemos) # z.ex <- z.test(rnorm(25,100,5),99,5) # z.ex # plot(z.ex) but encounter this error for the last command Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -Inf the z.ex looks fine though One Sample z-test data: rnorm(25, 100, 5) z = 0.7622, n = 25, Std. Dev. = 5, Std. Dev. of the sample mean = 1, p-value = 0.4459 alternative hypothesis: true mean is not equal to 99 95 percent confidence interval: 97.80223 101.72216 sample estimates: mean of rnorm(25, 100, 5) 99.7622 Can someone help me out here? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using a package function inside another function
Try adding a statement at the beginning of your function: require(micEcon) See if that helps. Andrew Miles Department of Sociology Duke University On Oct 7, 2010, at 11:47 AM, Alison Callahan wrote: Hello all, I am trying to use the micEcon 'insertRow' function inside a function I have written. For example: insert_row_test <- function(m){ insertRow(m,nrow(m)+1,v=0,rName="test") } However, when I try to call the 'insert_row_test' function (after loading the micEcon package), it does not insert a row into the matrix I pass in. When I call the insertRow function exactly as above in the R console, it works with no problem. Can anyone tell me why this is, and how to fix this problem? Thank you! Alison Callahan PhD candidate Department of Biology Carleton University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can i do anova
Type ?anova on your R command line for the basic function, and links to related functions. Also, try a google search of something like "doing anova in R" and you should find multiple tutorials or examples. Andrew Miles On Oct 11, 2010, at 11:33 AM, Mauluda Akhtar wrote: Hi, I've a table like the following. I want to do ANOVA. Could you please tell me how can i do it. I want to show whether the elements (3 for each column) of a column are significantly different or not. Just to inform you that i'm a new user of "R" bp_30048741 bp_30049913 bp_30049953 bp_30049969 bp_30049971 bp_30050044 [1,] 69 46 43 54 54 41 [2,] 68 22 39 31 31 22 [3,] 91 54 57 63 63 50 Thank you. Moon [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R on a ma c
R works great on my Mac. In fact, the user interface in some ways seems to be more friendly (ex. you type an open parenthesis, it automatically includes a close parenthesis; color coding for coding files, etc.) Andrew Miles On Oct 14, 2010, at 2:49 PM, David Cross wrote: I have had no problems with R on my Mac ... just download, install, and run ... let me know if you have any problems. Cheers David Cross d.cr...@tcu.edu www.davidcross.us On Oct 14, 2010, at 11:25 AM, Tiffany Kinder wrote: Hello, Is R very compatible with a Mac? A colleague of mine indicated that everyone he knows with a Mac has problems with R. What can you tell me about using R with a Mac. What do I need to download? I have downloaded the basic R package. Thanks, -- Tiffany Kinder MS Student Department of Watershed Science Utah State University tiffany.kin...@aggiemail.usu.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Many Thanks
And thank YOU for taking the time to express your gratitude. I'm sure all those who regularly take the time to contribute to the list appreciate the appreciation. Andrew Miles On Oct 15, 2010, at 9:49 AM, Jumlong Vongprasert wrote: Dear R-help mailing list and software development team. After I have used R a few weeks, I was exposed to the best of the program. In addition, the R-help mailing list a great assist new users. I do my job as I want and get great support from the R-help mailing list. Thanks R-help mailing list. Thanks software development team. Jumlong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem exporting data using write.foreign
My question is about the write.foreign() command in the foreign package. I use a command like the following to try and output data and a code file to read my data into SAS. write.foreign(data.frame.object, datafile="filepath", codefile="filepath", package="SAS", dataname="myData") With my data set, it gives the following error: Error in make.SAS.names(names(df), validvarname = validvarname) : Cannot uniquely abbreviate the variable names to 32 or fewer characters I tried to write reproducible code but could not. I'm not sure where to go from here. What are the naming protocols for variables so that they can be exported using write.foreign()? Thanks! Andrew Miles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem exporting data using write.foreign
I was able to figure this one out. It turns out that truncation was not the problem, as I had no variables with names longer than 32 characters (that is quite a long name!). I document my process here so that future users with the same problem can benefit. First, I examined the code for the function throwing the error: foreign:::make.SAS.names This revealed that the error message I was getting was thrown any time the transformed/reformatted variable name was too long or duplicated another variable name: if (any(nchar(x) > nmax) || any(duplicated(x))) stop("Cannot uniquely abbreviate the variable names to ", nmax, " or fewer characters") I was able to isolate which variables were the problem as follows: which(duplicated(x)) where x is the vector of transformed variable names which were transformed using make.SAS.names code: x <- sub("^([0-9])", "_\\1", varnames) x <- gsub("[^a-zA-Z0-9_]", "_", x) x <- abbreviate(x, minlength = nmax) This allowed to discover that I had two variables, retire_sp and retire.sp, which became the identical "retire_sp" after transformation. Simple, really, but frustrating until you get into the code and see what is happening, and how to isolate the offending variables. Andrew Miles On Oct 20, 2010, at 1:18 PM, Nordlund, Dan (DSHS/RDA) wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Andrew Miles Sent: Wednesday, October 20, 2010 10:10 AM To: r-help@r-project.org Subject: [R] Problem exporting data using write.foreign My question is about the write.foreign() command in the foreign package. I use a command like the following to try and output data and a code file to read my data into SAS. write.foreign(data.frame.object, datafile="filepath", codefile="filepath", package="SAS", dataname="myData") With my data set, it gives the following error: Error in make.SAS.names(names(df), validvarname = validvarname) : Cannot uniquely abbreviate the variable names to 32 or fewer characters I tried to write reproducible code but could not. I'm not sure where to go from here. What are the naming protocols for variables so that they can be exported using write.foreign()? Thanks! Andrew Miles Well, the error message tells you that the names must be unique when truncated to 32 characters. Apparently, you have at least 2 variables that have the same name when truncated to 32 characters. Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple imputation for nominal data
There are a couple of packages that do MI, including MI for nominal data. The most recent of these is "mi", but I believe "mice" might do it as well. Both are available on the CRAN, and both have useful articles that teach you how to use them. The citations for these articles can be found at the bottom of the help page that appears by typing ?mi OR for mice ?mice mi is the newer package and has some useful control features, but as it is newer it still is under development. Andrew Miles On Nov 2, 2010, at 3:38 PM, John Sorkin wrote: I am looking for an R function that will run multiple imputation (perhaps fully conditional imputation, MICE, or sequential generalized regression) for non-MVN data, specifically nominal data. My dependent variable is dichotomous, all my predictors are nominal. I have a total of 4,500 subjects, 1/2 of whom are missing the main independent variables. I would appreciate any suggestions that the users of the listserver might have. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for th...{{dropped: 6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting
You could try something like this: Loop through your bootstrapped samples and store which ones have the outlier you are looking for using code like: count = c(count, outlier.value %in% boot.sample$outlier.variable) Then subtract the count variable from the total number of samples to get the number of samples without the outlier N.nooutlier = Total - count Andrew Miles On Nov 16, 2010, at 4:55 PM, ufuk beyaztas wrote: Hi dear all, i have a data (data.frame) which contain y and x coloumn(i.e. y x 1 0.58545723 0.15113102 2 0.02769361 -0.02172165 3 1.00927527 -1.80072610 4 0.56504053 -1.12236685 5 0.58332337 -1.24263981 6 -1.70257274 0.46238255 7 -0.88501561 0.89484429 8 1.14466282 0.34193875 9 0.58827457 0.15923694 10 -0.79532232 -1.44193770) i changed the first data points by an outlier (i.e. y x 1 1025 2 0.02769361 -0.02172165 3 1.00927527 -1.80072610 4 0.56504053 -1.12236685 5 0.58332337 -1.24263981 6 -1.70257274 0.46238255 7 -0.88501561 0.89484429 8 1.14466282 0.34193875 9 0.58827457 0.15923694 10 -0.79532232 -1.44193770 ) then i generate the 1000 bootstrap sample with this data set, some of them not contain these outliers, some of them contain once and some of them contain many time... Now i want to count how many samples not contain these outliers. Thank so much any idea! -- View this message in context: http://r.789695.n4.nabble.com/Counting-tp3045756p3045756.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot linear model problem
There may be an easier way to do this, but you could always just do it the long way. Ex. plot(residuals(test.lm)~fitted.values(test.lm)) Andrew Miles On Nov 16, 2010, at 5:01 PM, casperyc wrote: Hi all, Say I fit a linear model, and saved it as 'test.lm' Then if I use plot(test.lm) it gives me 4 graphs How do I ask for a 'subset' of it?? say just want the 1st graph, the residual vs fitted values, or the 1,3,4th graph? I think I can use plot(test.lm[c(1,3,4)]) before, but now, it's not working... Every time, it goes to the end, the only thing I can click is 'next', it is impossible to save them individually I am using xp sp3, R 2.12. Thanks! casper -- View this message in context: http://r.789695.n4.nabble.com/plot-linear-model-problem-tp3045763p3045763.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] draw categorical histogram
Try: plot (myCatVariable) Andrew Miles Department of Sociology Duke University On Dec 1, 2010, at 2:51 PM, phoebe kong wrote: Hi, Can someone tell me how to draw a histogram for the following summary? Richard Minnie Albert Helen Joe Kingston 1233 56 6715 66 The summary tell that Richard has occurrence 12, Minnie has occurrence 33, and so on. I would like to view this summary in a histogram. I want the X-axis be the person name (Richard, Minnie, ), Y-axis be the frequency (12, 33). How can I make the histogram has 6 bars, each bar was named as Richard, Minnie, ... , Kingston? Thanks, Phoebe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] draw categorical histogram
Phoebe, In addition to the barplot method below, you really can use plot() to draw what it sounds like you are looking for IF you have a categorical variable. To illustrate, try running the following code: x=sample(c("Richard", "Minnie", "Albert", "Helen", "Joe", "Kingston"), 50, replace=T) x=as.factor(x) plot(x) See also ?plot.factor Andrew Miles On Dec 1, 2010, at 4:06 PM, Jorge Ivan Velez wrote: Hi Phoebe, Try x <- c(12, 33, 56, 67, 15, 66) names(x) <- c('Richard','Minnie','Albert','Helen','Joe','Kingston') barplot(x, las = 1, space = 0) HTH, Jorge On Wed, Dec 1, 2010 at 2:51 PM, phoebe kong <> wrote: Hi, Can someone tell me how to draw a histogram for the following summary? Richard Minnie Albert Helen Joe Kingston 1233 56 6715 66 The summary tell that Richard has occurrence 12, Minnie has occurrence 33, and so on. I would like to view this summary in a histogram. I want the X-axis be the person name (Richard, Minnie, ), Y- axis be the frequency (12, 33). How can I make the histogram has 6 bars, each bar was named as Richard, Minnie, ... , Kingston? Thanks, Phoebe [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xyplot
Try sample() which will allow you to randomly select 10 ID's from your ID variable, which you can then plot. Andrew Miles On Dec 16, 2010, at 10:49 AM, Rasanga Ruwanthi wrote: Hi I am using following code to produce a xyplot for some longitudinal data. There are 2 panels. It produced all longitudinal trajectories with mean profile. But since the dataset it very large plot looks very messy. I want to show, say 10 randomly selected individual longitudinal trajectories together with mean profile for entire dataset. Could any help me to alter the following code to do this? or is there an alternative way? Thanks in advance. xyplot(Y~ time|status,groups=ID,data=heart, type="l",lty=1, layout=c(2,1),main="", panel = function (x, y, subscripts, groups, ...) { panel.superpose (x, y, panel.groups = "panel.xyplot",subscripts,groups, ...) panel.loess (x, y, col = "Red", lwd = 2,span=0.75, ...) }) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Title for y-axis on right side
Take a look at mtext() which offers options for writing text in any margin of the table. Andrew Miles On Dec 17, 2010, at 6:41 AM, phils_mu...@arcor.de wrote: Hi, I want to have a title for the y-axis on the right side of the plot. I know how to do it on the left side: title(ylab="Title for y-axis") But how can I have the title on the right side? Greets, Phil __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] After heteroskedasticity correction, how can I get new confidential interval?
It is hard to say without knowing more about the type of model you are running. A good place to look would be at the coeftest() function in the package lmtest. Andrew Miles On Dec 20, 2010, at 10:04 AM, JoonGi wrote: I just corrected std.error of my 'model'(Multi Regression). Then how can I get new t and p-values? Isn't there any R command which shows new t and p values? -- View this message in context: http://r.789695.n4.nabble.com/After-heteroskedasticity-correction-how-can-I-get-new-confidential-interval-tp3095643p3095643.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] robust standard error of an estimator
It depends on what you mean by "robust." Robust to what? I recommend looking at the sandwich package which gives heteroskedasticity and autocorrelation robust variance/covariance matrices. For instance, you could do the following to get your OLS estimates with heteroskedasticity consistent SEs library(sandwich) library(lmtest) reg=lm(fsn~lctot) coeftest(reg, vcov=vcovHC(reg)) Or to get cluster robust SEs, check out this: people.su.se/~ma/ clustering.pdf Hope that helps. Andrew Miles On Jan 1, 2011, at 10:09 AM, Charlène Cosandier wrote: > Hi, > > I have ove the robust standard error of an estimator but I don't > know how to > do this. > The code for my regression is the following: > reg<-lm(fsn~lctot) > But then what do I need to do? > > -- > Charlène Lisa Cosandier > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: Reducing pdf file size
I assume you mean PDFs generated by R. This topic has been addressed here: http://tolstoy.newcastle.edu.au/R/e2/help/07/05/17475.html I have always just output the graphics then used an external PDF program (like Preview on the Mac) to do changes in file type, size reductions, etc. Andrew Miles On Jan 5, 2011, at 3:12 PM, kurt_h...@nps.gov wrote: > Greetings > Does anyone have any suggestions for reducing pdf file size, > particularly pdfs containing photos, without sacrificing quality? > Thanks > for any tips in advance. > Cheers > Kurt > > *** > Kurt Lewis Helf, Ph.D. > Ecologist > EEO Counselor > National Park Service > Cumberland Piedmont Network > P.O. Box 8 > Mammoth Cave, KY 42259 > Ph: 270-758-2163 > Lab: 270-758-2151 > Fax: 270-758-2609 > > Science, in constantly seeking real explanations, reveals the true > majesty > of our world in all its complexity. > -Richard Dawkins > > The scientific tradition is distinguished from the pre-scientific > tradition > in having two layers. Like the latter it passes on its theories but > it > also passes on a critical attitude towards them. The theories are > passed > on not as dogmas but rather with the challenge to discuss them and > improve > upon them. > -Karl Popper > > ...consider yourself a guest in the home of other creatures as > significant > as yourself. > -Wayside at Wilderness Threshold in McKittrick Canyon, Guadalupe > Mountains > National Park, TX > > Cumberland Piedmont Network (CUPN) Homepage: > http://tiny.cc/e7cdx > > CUPN Forest Pest Monitoring Website: > http://bit.ly/9rhUZQ > > CUPN Cave Cricket Monitoring Website: > http://tiny.cc/ntcql > > CUPN Cave Aquatic Biota Monitoring Website: > http://tiny.cc/n2z1o > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memisc-Tables with robost standard errors
I always use apsrtable in the apsrtable package, which allows you to specify a vcov matrix using the "se" option. The only trick is that you have to append it to your model object, something like this: fit=lm(y ~ x) fit$se=vcovHC(fit) apsrtable(fit, se="robust") Andrew Miles On Jan 5, 2011, at 7:57 PM, Jan Henckens wrote: Hello, I've got a question concerning the usage of robust standard errors in regression using lm() and exporting the summaries to LaTeX using the memisc-packages function mtable(): Is there any possibility to use robust errors which are obtained by vcovHC() when generating the LateX-output by mtable()? I tried to manipulate the lm-object by appending the "new" covariance matrix but mtable seems to generate the summary itself since it is not possible to call mtable(summary(lm1)). I'd like to obtain a table with the following structure (using standard errors I already worked out how to archieve it): Variable & Coeff. & robust S.E. & lower 95% KI & upper 95% KI \\ Var1 & x.22^(*) & (x) & [x & x] \\ . . . Maybe someone has any suggestions how to implement this kind of table? Best regards, Jan Henckens -- jan.henckens | jöllenbecker str. 58 | 33613 bielefeld tel 0521-5251970 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression Testing
Perhaps the easiest way to incorporate the heteroskedasticity consistent SE's and output them in a familiar and easy to interpret format is to use coeftest() in the lmtest package. coeftest(myModel, vcov=vcovHC(myModel)) Andrew Miles On Jan 20, 2011, at 4:42 PM, Achim Zeileis wrote: On Thu, 20 Jan 2011, Mojo wrote: I'm new to R and some what new to the world of stats. I got frustrated with excel and found R. Enough of that already. I'm trying to test and correct for Heteroskedasticity I have data in a csv file that I load and store in a dataframe. ds <- read.csv("book2.csv") df <- data.frame(ds) I then preform a OLS regression: lmfit <- lm(df$y~df$x) Just btw: lm(y ~ x, data = df) is somewhat easier to read and also easier to write when the formula involves more regressors. To test for Heteroskedasticity, I run the BPtest: bptest(lmfit) studentized Breusch-Pagan test data: lmfit BP = 11.6768, df = 1, p-value = 0.0006329 From the above, if I'm interpreting this correctly, there is Heteroskedasticity present. To correct for this, I need to calculate robust error terms. That is one option. Another one would be using WLS instead of OLS - or maybe FGLS. As the model just has one regressor, this might be possible and result in a more efficient estimate than OLS. From my reading on this list, it seems like I need to vcovHC. That's another option, yes. vcovHC(lmfit) (Intercept) df$x (Intercept) 1.057460e-03 -4.961118e-05 df$x -4.961118e-05 2.378465e-06 I'm having a little bit of a hard time following the help pages. Yes, the manual page is somewhat technical but the first thing the "Details" section does is: It points you to some references that should be easier to read. I recommend starting with Zeileis A (2004), Econometric Computing with HC and HAC Covariance Matrix Estimators. _Journal of Statistical Software_, *11*(10), 1-17. URL http://www.jstatsoft.org/v11/i10/>. That has also some worked examples. So is the first column the intercepts and the second column new standard errors? As David pointed out, it's the full covariance matrix estimate. hth, Z Thanks, mojo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mlogit error message "Error in X[omitlines, ] <- NA : subscript out of bounds"
Yong, I saw your post from April 29th about the error message in the mlogit package, copied below. I had the same problem, but solved it by omitting all missing data from my data frame before running mlogit.data(). ex. mydata = na.omit(mydata) I am posting this to the R help list as well so that an answer to this question will be documented. Andrew Miles I am using the mlogit packages and get a data problem, for which I can't find any clue from R archive. code below shows my related code all the way to the error * #--- * mydata <- data.frame(dependent,x,y,z) mydata$dependent<-as.factor(mydata$dependent) mldata<-mlogit.data(mydata, varying=NULL, choice="dependent", shape="wide") summary(mlogit.1<- mlogit(dependent~1|x+y+z, data = mldata, reflevel="0")) "Error in X[omitlines, ] <- NA : subscript out of bounds" , * #--- * Could anybody kindly tip how can I possibly solve this problem? Thank you yong [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Incorrect degrees of freedom in SEM model using lavaan
I have been trying to use lavaan (version 0.4-7) for a simple path model, but the program seems to be computing far less degrees of freedom for my model then it should have. I have 7 variables, which should give (7)(8)/2 = 28 covariances, and hence 28 DF. The model seems to only think I have 13 DF. The code to reproduce the problem is below. Have I done something wrong, or is this something I should take to the developer? library(lavaan) myCov = matrix(c(24.40, 0, 0, 0, 0, 0, 0, .03, .03, 0, 0, 0, 0, 0, 6.75, - .08, 519.38, 0, 0, 0, 0, .36, .01, 2.74, .18, 0, 0, 0, .51, .0, -.31, .02, .2, .0, 0, -.17, .0, -1.6, -.04, .01, .25, 0, -.1, .02, -.03, .0, -.01, .01 , .02), nrow=7, byrow=TRUE, dimnames=list(c("Internet", "Stress3", "HHincome", "Race", "Age", "Gender", "Stress1"), c("Internet", "Stress3", "HHincome", "Race", "Age", "Gender", "Stress1"))) model = ' Internet ~ HHincome + Race + Age + Gender + Stress1 Stress3 ~ Internet + HHincome + Race + Age + Gender + Stress1 ' fit=sem(model, sample.nobs=161, sample.cov=myCov, mimic.Mplus=FALSE) #check the number of parameters being estimated inspect(fit, what="free") #Note the DF for the Chi-square is 0, when it should be 28-13 = 15 summary(fit) Andrew Miles [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Higher log-likelihood in null vs. fitted model
Two related questions. First, I am fitting a model with a single predictor, and then a null model with only the intercept. In theory, the fitted model should have a higher log-likelihood than the null model, but that does not happen. See the output below. My first question is, how can this happen? > m Call: glm(formula = school ~ sv_conform, family = binomial, data = dat, weights = weight) Coefficients: (Intercept) sv_conform -2.5430 0.2122 Degrees of Freedom: 1488 Total (i.e. Null); 1487 Residual Null Deviance:786.1 Residual Deviance: 781.9 AIC: 764.4 > null Call: glm(formula = school ~ 1, family = binomial, data = dat, weights = weight) Coefficients: (Intercept) -2.532 Degrees of Freedom: 1488 Total (i.e. Null); 1488 Residual Null Deviance:786.1 Residual Deviance: 786.1 AIC: 761.9 > logLik(m); logLik(null) 'log Lik.' -380.1908 (df=2) 'log Lik.' -379.9327 (df=1) > My second question grows out of the first. I ran the same two model on the same data in Stata and got identical coefficients. However, the log-likelihoods were different than the one's I got in R, and followed my expectations - that is, the null model has a lower log-likelihood than the fitted model. See the Stata model comparison below. So my question is, why do identical models fit in R and Stata have different log-likelihoods? - Model |Obsll(null)ll(model) df AIC BIC -+--- mod1 | 1489-393.064 -390.9304 2785.8608796.4725 null | 1489-393.064 -393.064 1 788.1279 793.4338 Thanks in advance for any input or references. Andrew Miles [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Higher log-likelihood in null vs. fitted model
Interesting. If you use the deviances printed out by the fitted model (including the null deviance) and back-engineer it to log-likelihoods, you get results identical to Stata. > m$deviance*(-.5) [1] -390.9304 > m$null.deviance*(-.5) [1] -393.064 However, using the deviance to calculate the AIC by hand does not get you the AIC that the model returns. > m$deviance + 2*2 [1] 785.8608 > m$aic [1] 764.3816 The difference seems to be in how the function glm.fit() calculates the AIC that it returns, and which the function logLik() uses to return the log-likelihood, as Mark indicated. The function is: binomial()$aic function (y, n, mu, wt, dev) { m <- if (any(n > 1)) n else wt -2 * sum(ifelse(m > 0, (wt/m), 0) * dbinom(round(m * y), round(m), mu, log = TRUE)) } On the other hand, glm() calculates the deviance based on the sum of the deviance residuals. This gives me a sense for how log-likelihoods calculated using the deviance and model AIC could differ (i.e., different ways of calculating them), but it is still not clear to me *why *the two approaches differ. And they only differ when I use weights with the data. More importantly, it makes me wonder which is more reliable - in my case, it seems the deviance residuals approach is more reliable in that a) it gives a sensible result (i.e., a model with a predictor has a higher log-likelihood than the null model), and b) it matches an independent estimation performed in Stata. But is that always the case? And if so, why use the other approach at all? On Thu, May 31, 2012 at 9:55 AM, Mark Leeds wrote: > Hi Duncan: I don't know if the following can help but I checked the code > and logLik defines the log likelihood as (p - glmobject$aic/2) where p is > the glmobject$rank. So, > the reason for the likelihood being less is that, in the null, it ends up > being ( 1 - glmobject$aic/2) and in the other one it ends up being ( 2 - > glmobject$aic/2). > > so > > 2 - 764.4/2 = -380.2 and > > 1 - 761.9/2 = -379.95 ( close enough for govt work ) > > So, that's where the #'s are coming from but it really depends on how AIC > is defined. > Likelihoods should not involve degrees of freedom ( atleast not where they > make > likelihood less like in the above example ) so maybe backing the > likelihood out using > AIC is the issue ? ( AIC = -2 * likelihood + 2p so p - AIC/2 = > likelihood). AIC is a function of the likelihood but , as far as I know, > likelihood is not a function of the AIC. > Thanks for any insight. > > > > > > > On Thu, May 31, 2012 at 9:26 AM, Duncan Murdoch > wrote: > >> On 12-05-31 8:53 AM, Andrew Miles wrote: >> >>> Two related questions. >>> >>> First, I am fitting a model with a single predictor, and then a null >>> model >>> with only the intercept. In theory, the fitted model should have a >>> higher >>> log-likelihood than the null model, but that does not happen. See the >>> output below. My first question is, how can this happen? >>> >> >> I suspect you'll need to give sample data before anyone can really help >> with this. >> >> >>> m >>>> >>> >>> Call: glm(formula = school ~ sv_conform, family = binomial, data = dat, >>> weights = weight) >>> >>> Coefficients: >>> (Intercept) sv_conform >>> -2.5430 0.2122 >>> >>> Degrees of Freedom: 1488 Total (i.e. Null); 1487 Residual >>> Null Deviance:786.1 >>> Residual Deviance: 781.9 AIC: 764.4 >>> >>>> null >>>> >>> >>> Call: glm(formula = school ~ 1, family = binomial, data = dat, weights = >>> weight) >>> >>> Coefficients: >>> (Intercept) >>> -2.532 >>> >>> Degrees of Freedom: 1488 Total (i.e. Null); 1488 Residual >>> Null Deviance:786.1 >>> Residual Deviance: 786.1 AIC: 761.9 >>> >>>> logLik(m); logLik(null) >>>> >>> 'log Lik.' -380.1908 (df=2) >>> 'log Lik.' -379.9327 (df=1) >>> >>>> >>>> >>> My second question grows out of the first. I ran the same two model on >>> the >>> same data in Stata and got identical coefficients. However, the >>> log-likelihoods were different than the one's I got in R, and followed my >>> expectations - that is, the null model has a lower log-likelihood than >>> the >>> fitted model. See the Stata model comparison below. So my question is, >>> why do identical models fit in R and Stata have diff
Re: [R] Seeking pointers to various regression techniques with R?
The R book by Michael Crawley has some discussion the type of R syntax you are looking for in the chapter on statistical modeling. As for the formulae you gave... lm(y ~ x*w - 1) fits an interaction between x and w without an intercept, along with the main effects for x and w lm(y ~ x:w - 1) fits just the interaction between x and w without an intercept, and without the main effects for x and w lm(y ~ x/w - 1) I believe this fits the nested factor w inside of x Andrew Miles On Jun 5, 2012, at 4:58 PM, Michael wrote: > I read your website but still don't know the difference between the three > formulas... > > Thank you! > > On Mon, Jun 4, 2012 at 11:14 PM, Joshua Wiley wrote: > >> Hi Michael, >> >> This is far from exhaustive (I wrote it as an introduction some years >> ago) but you may find it useful to start: >> https://joshuawiley.com/R/formulae_in_R.aspx >> >> Cheers, >> >> Josh >> >> On Mon, Jun 4, 2012 at 9:06 PM, Michael wrote: >>> Hi all, >>> >>> Could you please point me to good materials on various >>> tricks/intuitions/techniques of regression, and hopefully in R? >>> >>> For example, what does lm(y~ x * w - 1) mean vs. lm(y ~ x/w -1 ) vs. lm >> (y >>> ~ x:w-1), etc... >>> >>> I just found that even simple linear regression is not that simple and >>> there are a lot of tricks/techniques in using them... >>> >>> Hopefully I can find good materials on these! >>> >>> Thank you! >>> >>> [[alternative HTML version deleted]] >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> Programmer Analyst II, Statistical Consulting Group >> University of California, Los Angeles >> https://joshuawiley.com/ >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I obtain the current active path of a function that's being called?
Can you provide an example of the code file that you use to call the different functions? Without that it might be hard for people to answer your question. Offhand, I'd say that it is whatever version of function A was called last. If you are loading functions into the workspace, they are treated as any other object, which is to say that you can only have one function of the same name at a time. Hence whenever you call a source file to load in function A, the old function A gets overwritten. Andrew Miles On Jun 5, 2012, at 4:58 PM, Michael wrote: > Hi all, > > How do I obtain the current active path of a function that's being called? > > That's to say, I have several source files and they all contain definition > of function A. > > I would like to figure out which function A and from which file is the one > that's being called and is currently active? > > Thanks a lot! > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FIML using lavaan returns zeroes for coefficients
Hello! I am trying to reproduce (for a publication) analyses that I ran several months ago using lavaan, I'm not sure which version, probably 0.4-12. A sample model is given below: pathmod='mh30days.log.w2 ~ mh30days.log + joingroup + leavegroup + alwaysgroup + grp.partic.w2 + black + age + bivoc + moved.conf + local.noretired + retired + ds + ministrytime + hrswork + nomoralescore.c + negint.c + cong.conflict.c + nomoraleXjoin + nomoraleXleave + nomoraleXalways + negintXjoin + negintXleave + negintXalways + conflictXjoin + conflictXleave + conflictXalways ' mod1 = sem(pathmod, data=sampledat, missing="fiml", se="robust") At the time, the model ran fine. Now, using version 0.4-14, the model returns all 0's for coefficients. This does not happen, however, when I run the model using listwise deletion for missing data. Any idea what is happening, or how I can fix it? For those wishing to reproduce the problem, you can download a sample code file and data frame from the following two links. https://fds.duke.edu/db/aas/Sociology/grad/aam34/files/problem%20code.R https://fds.duke.edu/db/aas/Sociology/grad/aam34/files/sampledat.rdata Thanks for your help! Andrew Miles [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FIML using lavaan returns zeroes for coefficients
Thanks for the helpful explanation. As to your question, I sometimes use lavaan to fit univariate regressions simply because it can handle missing data using FIML rather than listwise deletion. Are there reasons to avoid this? BTW, thanks for the update in the development version. Andrew Miles On Jul 21, 2012, at 12:59 PM, yrosseel wrote: > On 07/20/2012 10:35 PM, Andrew Miles wrote: >> Hello! >> >> I am trying to reproduce (for a publication) analyses that I ran >> several months ago using lavaan, I'm not sure which version, probably >> 0.4-12. A sample model is given below: >> >> pathmod='mh30days.log.w2 ~ mh30days.log + joingroup + leavegroup + >> alwaysgroup + grp.partic.w2 + black + age + bivoc + moved.conf + >> local.noretired + retired + ds + ministrytime + hrswork + >> nomoralescore.c + negint.c + cong.conflict.c + nomoraleXjoin + >> nomoraleXleave + nomoraleXalways + negintXjoin + negintXleave + >> negintXalways + conflictXjoin + conflictXleave + conflictXalways ' >> mod1 = sem(pathmod, data=sampledat, missing="fiml", se="robust") >> >> At the time, the model ran fine. Now, using version 0.4-14, the >> model returns all 0's for coefficients. > > What happened is that since 0.4-14, lavaan tries to 'detect' models that are > just univariate regression, and internally calls lm.fit, instead of the > lavaan estimation engine, at least when the missing="ml" argument is NOT > used. (BTW, I fail to understand why you would use lavaan if you just want to > fit a univariate regression). > > When missing="ml" is used, lavaan normally checks if you have fixed x > covariates (which you do), and if fixed.x=TRUE (which is the default). In > 0.4, lavaan internally switches to fixed.x=FALSE (which implicitly assumes > that all your predictors are continuous, but I assume you would not using > missing="ml" otherwise). Unfortunately, for the 'special' case of univariate > regression, it fails to do this. This behavior will likely change in 0.5, > where, by default, only endogenous/dependent variables will be handled by > missing="ml", not exogenous 'x' covariates. > > To fix it: simply add the fixed.x=FALSE argument, or revert to 0.4-12 to get > the old behavior. > > Hope this helps, > > Yves. > http://lavaan.org > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lavaan: Immediate non-positive definite matrix
Hi, I recently tried to estimate a linear unconditional latent growth curve on 7 repeated measures using lavaan (most recent version): modspec=' alpha =~ 1*read_g0 + 1*read_g1 + 1*read_g2 + 1*read_g3 + 1*read_g4 + 1*read_g5 + 1*read_g6 beta =~ 0*read_g0 + 1*read_g1 + 2*read_g2 + 3*read_g3 + 4*read_g4 + 5*read_g5 + 6*read_g6 ' gmod=lavaan(modspec, data=math, meanstructure=T, int.ov.free=F, int.lv.free= T, auto.var=T, auto.cov.lv.x=T, auto.cov.y=T, missing="direct", verbose=T) The model immediately returned the following error message: Error in chol.default(S) : the leading minor of order 5 is not positive definite Error in lavSampleStatsFromData(Data = lavaanData, rescale = (lavaanOptions$estimator == : lavaan ERROR: sample covariance can not be inverted I ran the same model in Mplus and found that it has low covariance coverage for the 7 repeated measures, but all coverage is greater than 0. The model ran fine in Mplus once I set the covariance coverage limit to .001. That provided me starting values to try in lavaan, but again it immediately returned the same error message. In fact, nothing I could do allowed me to fit the model without it immediately returning the same error message (e.g., I tried changing the optimizer, the type of likelihood being used). Because the error message pops up immediately (i.e., not after lavaan tries to estimate the model for a few seconds), it makes me think that my data is failing some sort of initial data check. Any ideas what is happening, or what to do about it? Thanks. P.S. I haven't had success in attaching data files when I submit to the R help list, so if you would like the data please email me at andrewami...@hotmail.com. Andrew Miles [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wide to long form conversion
Take a look here. http://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format Andrew Miles Department of Sociology Duke University On Oct 6, 2011, at 4:28 PM, Gang Chen wrote: > I have some data 'myData' in wide form (attached at the end), and > would like to convert it to long form. I wish to have five variables > in the result: > > 1) Subj: factor > 2) Group: between-subjects factor (2 levels: s / w) > 3) Reference: within-subject factor (2 levels: Me / She) > 4) F: within-subject factor (2 levels: F1 / F2) > 5) J: within-subject factor (2 levels: J1 / J2) > > As this is the first time I'm learning such a conversion, could > someone help me out? > > Many thanks, > Gang > >> myData > > Group MeF1 MeJ1 SheF1 SheJ1 MeF2 MeJ2 SheF2 SheJ2 Subj > 1 s45 61036 6 9 S1 > 2 s65 5 643 5 6 S2 > 3 s74 6 574 5 3 S3 > 4 s85 8 771 8 6 S4 > 5 s 106 4 796 4 6 S5 > 6 s52 4 741 4 2 S6 > 7 s 13210 4 112 4 3 S7 > 8 s81 31160 310 S8 > 9 s69 5 868 5 6 S9 > 10 s 145 610 135 510 S10 > 11 s 15218 2 14118 2 S11 > 12 s69 4 95 11 3 8 S12 > 13 s55 01243 0 8 S13 > 14 s56 4 946 2 6 S14 > 15 s 14512 3 12311 3 S15 > 16 s7211 35210 2 S16 > 17 s17 4 516 3 5 S17 > 18 s62 7 462 7 4 S18 > 19 s94 8 5 104 6 3 S19 > 20 s82 6 592 6 4 S20 > 21 s65 5 766 5 5 S21 > 22 s88 3 767 5 3 S22 > 23 s 114 6 711 6 4 S23 > 24 s63 2 464 2 2 S24 > 25 s44 6 623 4 6 S25 > 26 w59 4 737 3 5 S26 > 27 w76 3 541 0 4 S27 > 28 w 10414 28410 2 S28 > 29 w97 5 684 5 3 S29 > 30 w92 7 562 6 5 S30 > 31 w67 6 765 5 8 S31 > 32 w7612 76310 7 S32 > 33 w 123 8 9 113 4 7 S33 > 34 w 12210 592 6 3 S34 > 35 w6310 453 5 3 S35 > 36 w93 9 963 7 8 S36 > 37 w5 11 7 74 11 3 4 S37 > 38 w74 4 673 1 5 S38 > 39 w65 1 833 0 8 S39 > 40 w 10310 273 7 2 S40 > 41 w1 11 7 518 4 3 S41 > 42 w 105 610 104 3 9 S42 > 43 w63 9 242 6 0 S43 > 44 w9511 454 7 3 S44 > 45 w85 6 384 2 3 S45 > 46 w84 8 741 2 6 S46 > 47 w 122 6 2 101 5 2 S47 > 48 w 106 9 875 7 8 S48 > 49 w 13615 1 12414 0 S49 > 50 w78 11247 111 S50 > 51 w 123 9 491 7 4 S51 > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FIML in R
Does anyone know if someone is developing full-information maximum likelihood (FIML) estimation algorithms for basic regression functions, like glm()? I think that having FIML options for workhorse functions that typically use ML would give R an edge over other statistical software, given how well FIML performs in missing data situations compared to ML. While my current level of programming ability isn't up to the task, I bring this up to a) see if there is something already underway, and if not b) perhaps spark interest in doing so. I think much of the raw material is already out there: there are other packages that use the EM algorithm, and other packages that search out missing data patterns (e.g., md.pattern() in the mice package). To facilitate adoption, it might be useful to write a FIML module that other package maintainers can incorporate into their code. I'd be interested in updates on this, and in your thoughts on this more generally. Also, please let me know if there is a forum better suited for this kind of discussion. Andrew Miles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multivariate Multilevel Model: is R the right software for this problem
I recommend looking at chapter 6 of Paul Allison's Fixed Effects Regression Models. This chapter outlines how you can use a structural equation modeling framework to estimate a multi-level model (a random effects model). This approach is slower than just using MLM software like lmer() in the lme4 package, but has the advantage of being able to specify correlations between errors across time, the ability to control for time-invariant effects of time-invariant variables, and allows you to use the missing data maximum likelihood that comes in structural equation modeling packages. Andrew Miles Department of Sociology Duke University On Apr 6, 2012, at 9:48 AM, Eiko Fried wrote: > Hello, > > I've been trying to answer a problem I have had for some months now and > came across multivariate multilevel modeling. I know MPLUS and SPSS quite > well but these programs could not solve this specific difficulty. > > My problem: > 9 correlated dependent variables (medical symptoms; categorical, 0-3), 5 > measurement points, 10 time-varying covariates (life events; dichotomous, > 0-1), N ~ 900. Up to 35% missing values on some variables, especially at > later measurement points. > > My exploratory question is whether there is an interaction effect between > life events and symptoms - and if so, what the effect is exactly. E.g. life > event 1 could lead to more symptoms A B D whereas life event 2 could lead > to more symptoms A C D and less symptoms E. > > My question is: would MMM in R be a viable option for this? If so, could > you recommend literature? > > Thank you > --T > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multivariate Multilevel Model: is R the right software for this problem
I recommend looking at chapter 6 of Paul Allison's *Fixed Effects Regression Models*. This chapter outlines how you can use a structural equation modeling framework to estimate a multi-level model (a random effects model). This approach is slower than just using MLM software like lmer() in the lme4 package, but has the advantage of being able to specify correlations between errors across time, the ability to control for time-invariant effects of time-invariant variables, and allows you to use the missing data maximum likelihood that comes in structural equation modeling packages. Andrew Miles [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fixed effects linear model in R
Based on Paul Allison's booklet "Fixed Effect Regression Models" (2009), the FE model can be estimated by person-mean centering all of your variables (but not the outcome), and then including a random intercept for each person. The centering gives you the FE model estimates, and the random intercept adjusts the standard errors for clustering by individuals. Note that your data must be in person-period (or long) format to do this. In case you are unfamiliar with person-mean centering, that simply means taking the mean of each person's values for a given variable for all of the periods in your data, and then calculating a deviation from that mean at each time period. For example, a person's average income over four years might be $50,000, but in each year their actual income would be slightly higher or lower than this (these would be the person-mean deviations). In symbolic form, your code might look something like this: library(lme4) variable_pmcentered = variable - person_mean mod = lmer(outcome ~ variable_pmcentered + person_mean + other predictors + (1|personID)) The advantage of this method (which Allison calls a "hybrid" method) over traditional FE models is that you get the benefits of a FE model (subtracting out time-invariant omitted variables) along with the benefits of random effect models (e.g., estimating coefficients for time-invariant variables, estimating interactions with time, letting intercepts and slopes varying randomly, etc.) See Allison's booklet for more details on this method. Allison, Paul D. 2009. Fixed Effects Regression Models. Los Angeles, C.A.: Sage. Andrew Miles On Feb 7, 2012, at 5:00 PM, caribou...@gmx.fr wrote: > Dear R-helpers, > > First of all, sorry for those who have (eventually) already received that > request. > The mail has been bumped several times, so I am not sure the list has > received it... and I need help (if you have time)! ;-) > > I have a very simple question and I really hope that someone could help me > > I would like to estimate a simple fixed effect regression model with > clustered standard errors by individuals. > For those using Stata, the counterpart would be xtreg with the "fe" option, > or areg with the "absorb" option and in both case the clustering is achieved > with "vce(cluster id)" > > My question is : how could I do that with R ? > An important point is that I have too many individuals, therefore I cannot > include dummies and should use the demeaning "usual" procedure. > I tried with the plm package with the "within" option, but R quikcly tells me > that the memory limits are attained (I have over 10go ram!) while the dataset > is only 700mo (about 50 000 individuals, highly unbalanced) > I dont understand... plm do indeed demean the data so the computation should > be fast and light enough... and instead it saturates my memory and do not > converge... > > Do you have an idea ? > Moreover, it is possible to obtain cluster robust standard errors with plm ? > > Are there any other solutions for fixed effects linear models (with the > demean trick in order not to create as many dummies as individuals) ? > Many thanks in advance ! ;) > John > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help need
You need to store the values of each iteration in a vector, and then display the vector after you the loop terminates. I made a few updates to your code, and it seems to do what you want now. And thanks for including the code. That made is easy to know how to help. spectrum = c() for(f in seq(0,0.5,0.1)) { sigmasqaured <- 1 i = complex(real = 0, imaginary = 1) spectrum <- c(spectrum, (sigmasqaured)/(abs(1-2.7607*exp(2*pi*i*f)+3.8106*exp(4*pi*i*f)-2.6535*exp(6*pi*i*f)+0.9258*exp(8*pi*i*f))^2)) } spectrum Andrew Miles On Feb 7, 2012, at 4:08 PM, Jaymin Shah wrote: > I have mad a for loop to try and output values which i have named spectrum. > However, I cannot seem to get the answers to come out as a vector which is > what i need. They come out as separate values which I am then unable to join > together. Thank you > > for(f in seq(0,0.5,0.1)) { > sigmasqaured <- 1 > i = complex(real = 0, imaginary = 1) > spectrum <- > (sigmasqaured)/(abs(1-2.7607*exp(2*pi*i*f)+3.8106*exp(4*pi*i*f)-2.6535*exp(6*pi*i*f)+0.9258*exp(8*pi*i*f))^2) > print(spectrum) > } > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.