Re: [R] How to obtain individual log-likelihood value from glm?
In the book Modern Applied Statistics with S, 4th edition, 2002, by Venables and Ripley, there is a function logitreg on page 445, which does provide the weighted logistic regression I asked, judging by the loss function. And interesting enough, logitreg provides the same coefficients as glm in the example I provided earlier, even with weights < 1. Also for residual deviance, logitreg yields the same number as glm. Unless I misunderstood something, I am convinced that glm is a valid tool for weighted logistic regression despite the description on weights and somehow questionable logLik value in the case of non-integer weights < 1. Perhaps this is a bold claim: the description of weights can be modified and logLik can be updated as well. The stackexchange inquiry I provided is what I feel interesting, not the link in that post. Sorry for the confusion. On Sat, Aug 29, 2020 at 10:18 AM John Smith wrote: > Thanks for very insightful thoughts. What I am trying to achieve with the > weights is actually not new, something like > https://stats.stackexchange.com/questions/44776/logistic-regression-with-weighted-instances. > I thought my inquiry was not too strange, and I could utilize some existing > codes. It is just an optimization problem at the end of day, or not? Thanks > > On Sat, Aug 29, 2020 at 9:02 AM John Fox wrote: > >> Dear John, >> >> On 2020-08-29 1:30 a.m., John Smith wrote: >> > Thanks Prof. Fox. >> > >> > I am curious: what is the model estimated below? >> >> Nonsense, as Peter explained in a subsequent response to your prior >> posting. >> >> > >> > I guess my inquiry seems more complicated than I thought: with y being >> 0/1, how to fit weighted logistic regression with weights <1, in the sense >> of weighted least squares? Thanks >> >> What sense would that make? WLS is meant to account for non-constant >> error variance in a linear model, but in a binomial GLM, the variance is >> purely a function for the mean. >> >> If you had binomial (rather than binary 0/1) observations (i.e., >> binomial trials exceeding 1), then you could account for overdispersion, >> e.g., by introducing a dispersion parameter via the quasibinomial >> family, but that isn't equivalent to variance weights in a LM, rather to >> the error-variance parameter in a LM. >> >> I guess the question is what are you trying to achieve with the weights? >> >> Best, >> John >> >> > >> >> On Aug 28, 2020, at 10:51 PM, John Fox wrote: >> >> >> >> Dear John >> >> >> >> I think that you misunderstand the use of the weights argument to >> glm() for a binomial GLM. From ?glm: "For a binomial GLM prior weights are >> used to give the number of trials when the response is the proportion of >> successes." That is, in this case y should be the observed proportion of >> successes (i.e., between 0 and 1) and the weights are integers giving the >> number of trials for each binomial observation. >> >> >> >> I hope this helps, >> >> John >> >> >> >> John Fox, Professor Emeritus >> >> McMaster University >> >> Hamilton, Ontario, Canada >> >> web: https://socialsciences.mcmaster.ca/jfox/ >> >> >> >>> On 2020-08-28 9:28 p.m., John Smith wrote: >> >>> If the weights < 1, then we have different values! See an example >> below. >> >>> How should I interpret logLik value then? >> >>> set.seed(135) >> >>> y <- c(rep(0, 50), rep(1, 50)) >> >>> x <- rnorm(100) >> >>> data <- data.frame(cbind(x, y)) >> >>> weights <- c(rep(1, 50), rep(2, 50)) >> >>> fit <- glm(y~x, data, family=binomial(), weights/10) >> >>> res.dev <- residuals(fit, type="deviance") >> >>> res2 <- -0.5*res.dev^2 >> >>> cat("loglikelihood value", logLik(fit), sum(res2), "\n") >> >>>> On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard >> wrote: >> >>>> If you don't worry too much about an additive constant, then half the >> >>>> negative squared deviance residuals should do. (Not quite sure how >> weights >> >>>> factor in. Looks like they are accounted for.) >> >>>> >> >>>> -pd >> >>>> >> >>>>> On 25 Aug 2020, at 17:33 , John Smith wrote: >> >>>>> >> >>>>> Dear R-help, >> >&
Re: [R] How to obtain individual log-likelihood value from glm?
Thanks for very insightful thoughts. What I am trying to achieve with the weights is actually not new, something like https://stats.stackexchange.com/questions/44776/logistic-regression-with-weighted-instances. I thought my inquiry was not too strange, and I could utilize some existing codes. It is just an optimization problem at the end of day, or not? Thanks On Sat, Aug 29, 2020 at 9:02 AM John Fox wrote: > Dear John, > > On 2020-08-29 1:30 a.m., John Smith wrote: > > Thanks Prof. Fox. > > > > I am curious: what is the model estimated below? > > Nonsense, as Peter explained in a subsequent response to your prior > posting. > > > > > I guess my inquiry seems more complicated than I thought: with y being > 0/1, how to fit weighted logistic regression with weights <1, in the sense > of weighted least squares? Thanks > > What sense would that make? WLS is meant to account for non-constant > error variance in a linear model, but in a binomial GLM, the variance is > purely a function for the mean. > > If you had binomial (rather than binary 0/1) observations (i.e., > binomial trials exceeding 1), then you could account for overdispersion, > e.g., by introducing a dispersion parameter via the quasibinomial > family, but that isn't equivalent to variance weights in a LM, rather to > the error-variance parameter in a LM. > > I guess the question is what are you trying to achieve with the weights? > > Best, > John > > > > >> On Aug 28, 2020, at 10:51 PM, John Fox wrote: > >> > >> Dear John > >> > >> I think that you misunderstand the use of the weights argument to glm() > for a binomial GLM. From ?glm: "For a binomial GLM prior weights are used > to give the number of trials when the response is the proportion of > successes." That is, in this case y should be the observed proportion of > successes (i.e., between 0 and 1) and the weights are integers giving the > number of trials for each binomial observation. > >> > >> I hope this helps, > >> John > >> > >> John Fox, Professor Emeritus > >> McMaster University > >> Hamilton, Ontario, Canada > >> web: https://socialsciences.mcmaster.ca/jfox/ > >> > >>> On 2020-08-28 9:28 p.m., John Smith wrote: > >>> If the weights < 1, then we have different values! See an example > below. > >>> How should I interpret logLik value then? > >>> set.seed(135) > >>> y <- c(rep(0, 50), rep(1, 50)) > >>> x <- rnorm(100) > >>> data <- data.frame(cbind(x, y)) > >>> weights <- c(rep(1, 50), rep(2, 50)) > >>> fit <- glm(y~x, data, family=binomial(), weights/10) > >>> res.dev <- residuals(fit, type="deviance") > >>> res2 <- -0.5*res.dev^2 > >>> cat("loglikelihood value", logLik(fit), sum(res2), "\n") > >>>> On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard > wrote: > >>>> If you don't worry too much about an additive constant, then half the > >>>> negative squared deviance residuals should do. (Not quite sure how > weights > >>>> factor in. Looks like they are accounted for.) > >>>> > >>>> -pd > >>>> > >>>>> On 25 Aug 2020, at 17:33 , John Smith wrote: > >>>>> > >>>>> Dear R-help, > >>>>> > >>>>> The function logLik can be used to obtain the maximum log-likelihood > >>>> value > >>>>> from a glm object. This is an aggregated value, a summation of > individual > >>>>> log-likelihood values. How do I obtain individual values? In the > >>>> following > >>>>> example, I would expect 9 numbers since the response has length 9. I > >>>> could > >>>>> write a function to compute the values, but there are lots of > >>>>> family members in glm, and I am trying not to reinvent wheels. > Thanks! > >>>>> > >>>>> counts <- c(18,17,15,20,10,20,25,13,12) > >>>>> outcome <- gl(3,1,9) > >>>>> treatment <- gl(3,3) > >>>>> data.frame(treatment, outcome, counts) # showing data > >>>>> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) > >>>>> (ll <- logLik(glm.D93)) > >>>>> > >>>>>[[alternative HTML version deleted]] > >>>>> > >>>>> ___
Re: [R] How to obtain individual log-likelihood value from glm?
Thanks Prof. Fox. I am curious: what is the model estimated below? I guess my inquiry seems more complicated than I thought: with y being 0/1, how to fit weighted logistic regression with weights <1, in the sense of weighted least squares? Thanks > On Aug 28, 2020, at 10:51 PM, John Fox wrote: > > Dear John > > I think that you misunderstand the use of the weights argument to glm() for a > binomial GLM. From ?glm: "For a binomial GLM prior weights are used to give > the number of trials when the response is the proportion of successes." That > is, in this case y should be the observed proportion of successes (i.e., > between 0 and 1) and the weights are integers giving the number of trials for > each binomial observation. > > I hope this helps, > John > > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > web: https://socialsciences.mcmaster.ca/jfox/ > >> On 2020-08-28 9:28 p.m., John Smith wrote: >> If the weights < 1, then we have different values! See an example below. >> How should I interpret logLik value then? >> set.seed(135) >> y <- c(rep(0, 50), rep(1, 50)) >> x <- rnorm(100) >> data <- data.frame(cbind(x, y)) >> weights <- c(rep(1, 50), rep(2, 50)) >> fit <- glm(y~x, data, family=binomial(), weights/10) >> res.dev <- residuals(fit, type="deviance") >> res2 <- -0.5*res.dev^2 >> cat("loglikelihood value", logLik(fit), sum(res2), "\n") >>> On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard wrote: >>> If you don't worry too much about an additive constant, then half the >>> negative squared deviance residuals should do. (Not quite sure how weights >>> factor in. Looks like they are accounted for.) >>> >>> -pd >>> >>>> On 25 Aug 2020, at 17:33 , John Smith wrote: >>>> >>>> Dear R-help, >>>> >>>> The function logLik can be used to obtain the maximum log-likelihood >>> value >>>> from a glm object. This is an aggregated value, a summation of individual >>>> log-likelihood values. How do I obtain individual values? In the >>> following >>>> example, I would expect 9 numbers since the response has length 9. I >>> could >>>> write a function to compute the values, but there are lots of >>>> family members in glm, and I am trying not to reinvent wheels. Thanks! >>>> >>>> counts <- c(18,17,15,20,10,20,25,13,12) >>>> outcome <- gl(3,1,9) >>>> treatment <- gl(3,3) >>>> data.frame(treatment, outcome, counts) # showing data >>>> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) >>>> (ll <- logLik(glm.D93)) >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> __ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> -- >>> Peter Dalgaard, Professor, >>> Center for Statistics, Copenhagen Business School >>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>> Phone: (+45)38153501 >>> Office: A 4.23 >>> Email: pd@cbs.dk Priv: pda...@gmail.com >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>[[alternative HTML version deleted]] >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to obtain individual log-likelihood value from glm?
If the weights < 1, then we have different values! See an example below. How should I interpret logLik value then? set.seed(135) y <- c(rep(0, 50), rep(1, 50)) x <- rnorm(100) data <- data.frame(cbind(x, y)) weights <- c(rep(1, 50), rep(2, 50)) fit <- glm(y~x, data, family=binomial(), weights/10) res.dev <- residuals(fit, type="deviance") res2 <- -0.5*res.dev^2 cat("loglikelihood value", logLik(fit), sum(res2), "\n") On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard wrote: > If you don't worry too much about an additive constant, then half the > negative squared deviance residuals should do. (Not quite sure how weights > factor in. Looks like they are accounted for.) > > -pd > > > On 25 Aug 2020, at 17:33 , John Smith wrote: > > > > Dear R-help, > > > > The function logLik can be used to obtain the maximum log-likelihood > value > > from a glm object. This is an aggregated value, a summation of individual > > log-likelihood values. How do I obtain individual values? In the > following > > example, I would expect 9 numbers since the response has length 9. I > could > > write a function to compute the values, but there are lots of > > family members in glm, and I am trying not to reinvent wheels. Thanks! > > > > counts <- c(18,17,15,20,10,20,25,13,12) > > outcome <- gl(3,1,9) > > treatment <- gl(3,3) > > data.frame(treatment, outcome, counts) # showing data > > glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) > > (ll <- logLik(glm.D93)) > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd@cbs.dk Priv: pda...@gmail.com > > > > > > > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Passing formula and weights error
Thanks to Duncan and Bill for very helpful tips. On Fri, Aug 28, 2020 at 11:38 AM William Dunlap wrote: > Note that neither call to glm in your myglm function really works - > the first one is using the 'weights' object from the global > environment, not the weights argument. E.g., in the fresh R session, > where I avoid making unneeded assignments and use fixed x and y for > repeatability, > > > n <- 16 > > data <- data.frame(x = log2(1:n), y = 1:n) > > myglm2 <- function(formula, data, weights) > { > glm(formula, data=data, family=gaussian(), weights=weights) > } > > myglm2(y~., data=data, weights=1/(1:n)) > Error in model.frame.default(formula = formula, data = data, weights > = weights, : > invalid type (closure) for variable '(weights)' > > The error arises because glm finds stats::weights, a function, not the > argument called weights. glm(), lm() and their ilk evaluate their > weights and subset arguments in the environment of the formula. In > this case environment(y~.) is .GlobalEnv, not the function's > environment. The following function gives one way to deal with this, > by giving formula a new environment that inherits from its original > environment and contains the extra variables. > > > myglm3 <- function(formula, data, weights) > { > envir <- list2env(list(weights=weights), > parent=environment(formula)) > environment(formula) <- envir > glm(formula, data=data, family=gaussian(), weights=weights) > } > > myglm3(y~., data=data, weights=1/(1:n)) > > Call: glm(formula = formula, family = gaussian(), data = data, > weights = weights) > > Coefficients: > (Intercept)x > -0.09553 2.93352 > > Degrees of Freedom: 15 Total (i.e. Null); 14 Residual > Null Deviance: 60.28 > Residual Deviance: 7.72 AIC: 70.42 > > This is the same result you get with a direct call to > glm(y~., data=data, weights=1/(1:n)) > > This is a common problem and I don't know if there is a FAQ on it or a > standard function to deal with it. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Fri, Aug 28, 2020 at 8:33 AM John Smith wrote: > > > > Dear R-help: > > > > I am writing a function based on glm and would like some variations of > > weights. In the code below, I couldn't understand why the second glm > > function fails and don't know how to fix it: > > > > Error in eval(extras, data, env) : object 'newweights' not found > > Calls: print ... eval -> -> model.frame.default -> eval -> > eval > > Execution halted > > > > ### R code > > y <- rnorm(100) > > x <- rnorm(100) > > data <- data.frame(cbind(x, y)) > > weights <- rep(1, 100) > > n <- 100 > > myglm <- function(formula, data, weights){ > > ## this works > > print(glm(formula, data, family=gaussian(), weights)) > > ## this is not working > > newweights <- rep(1, n) > > glm(formula, data, family=gaussian(), weights=newweights) > > } > > myglm(y~., data, weights) > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing formula and weights error
Dear R-help: I am writing a function based on glm and would like some variations of weights. In the code below, I couldn't understand why the second glm function fails and don't know how to fix it: Error in eval(extras, data, env) : object 'newweights' not found Calls: print ... eval -> -> model.frame.default -> eval -> eval Execution halted ### R code y <- rnorm(100) x <- rnorm(100) data <- data.frame(cbind(x, y)) weights <- rep(1, 100) n <- 100 myglm <- function(formula, data, weights){ ## this works print(glm(formula, data, family=gaussian(), weights)) ## this is not working newweights <- rep(1, n) glm(formula, data, family=gaussian(), weights=newweights) } myglm(y~., data, weights) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to obtain individual log-likelihood value from glm?
Thanks Peter for a very promising tip. On Tue, Aug 25, 2020 at 11:40 AM peter dalgaard wrote: > If you don't worry too much about an additive constant, then half the > negative squared deviance residuals should do. (Not quite sure how weights > factor in. Looks like they are accounted for.) > > -pd > > > On 25 Aug 2020, at 17:33 , John Smith wrote: > > > > Dear R-help, > > > > The function logLik can be used to obtain the maximum log-likelihood > value > > from a glm object. This is an aggregated value, a summation of individual > > log-likelihood values. How do I obtain individual values? In the > following > > example, I would expect 9 numbers since the response has length 9. I > could > > write a function to compute the values, but there are lots of > > family members in glm, and I am trying not to reinvent wheels. Thanks! > > > > counts <- c(18,17,15,20,10,20,25,13,12) > > outcome <- gl(3,1,9) > > treatment <- gl(3,3) > > data.frame(treatment, outcome, counts) # showing data > > glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) > > (ll <- logLik(glm.D93)) > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd@cbs.dk Priv: pda...@gmail.com > > > > > > > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to obtain individual log-likelihood value from glm?
Dear R-help, The function logLik can be used to obtain the maximum log-likelihood value from a glm object. This is an aggregated value, a summation of individual log-likelihood values. How do I obtain individual values? In the following example, I would expect 9 numbers since the response has length 9. I could write a function to compute the values, but there are lots of family members in glm, and I am trying not to reinvent wheels. Thanks! counts <- c(18,17,15,20,10,20,25,13,12) outcome <- gl(3,1,9) treatment <- gl(3,3) data.frame(treatment, outcome, counts) # showing data glm.D93 <- glm(counts ~ outcome + treatment, family = poisson()) (ll <- logLik(glm.D93)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Classification Tree Prediction Error
As Bert advised correctly, this is not an R programming question. There is some misunderstanding on how training//test data work together in predictions. Suppose your test data has only one class. Therefore, you can get the following rate by betting on the majority class every time, again using data from the test set. In this case, the misclassification rate is 0! Of course no classification algorithm can beat that prediction for which you already utilize the truth in the test data. In conclusion, the tree model you provided has accuracy 0.837, which is very close to 0.85. I would not complain. On Tue, Aug 25, 2020 at 9:19 AM Xu Jun wrote: > Thank you for your comment! This tree function is from the tree package. > Although it might be a pure statistical question, it could be related to > how the tree function is used. I will explore the site that you suggested. > But if there is anyone who can figure it out off the top of their head, I'd > very much appreciate it. > > Jun > > On Mon, Aug 24, 2020 at 1:01 PM Bert Gunter > wrote: > > > Purely statistical questions -- as opposed to R programming queries -- > are > > generally off topic here. > > Here is where they are on topic: https://stats.stackexchange.com/ > > > > Suggestion: when you post, do include the package name where you get > > tree() from, as there might be > > more than one with this function. > > > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along > and > > sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > On Mon, Aug 24, 2020 at 8:58 AM Xu Jun wrote: > > > >> Dear all R experts, > >> > >> I have a question about using cross-validation to assess results > estimated > >> from a classification tree model. I annotated what each line does in > the R > >> code chunk below. Basically, I split the data, named usedta, into 70% > vs. > >> 30%, with the training set having 70% and the test set 30% of the > original > >> cases. After splitting the data, I first run a classification tree off > the > >> training set, and then use the results for cross-validation using the > test > >> set. It turns out that if I don't have any predictors and make > predictions > >> by simply betting on the majority class of the zero-one coding of the > >> binary response variable, I can do better than what the results from the > >> classification tree would deliver in the test set. What would this imply > >> and what would cause this problem? Does it mean that classification tree > >> is > >> not an appropriate method for my data; or, it's because I have too few > >> variables? Thanks a lot! > >> > >> Jun Xu, PhD > >> Professor > >> Department of Sociology > >> Ball State University > >> Muncie, IN 47306 > >> USA > >> > >> Using the estimates, I get the following prediction rate (correct > >> prediction) using the test set. Or we can say the misclassification > error > >> rate is 1-0.837 = 0.163 > >> > >> > (tab[1,1] + tab[2,2]) / sum(tab)[1] 0.837 > >> > >> > >> Without any predictors, I can get the following rate by betting on the > >> majority class every time, again using data from the test set. In this > >> case, the misclassification error rate is 1-0.85 = 0.15 > >> > >> > table(h2.test)h2.test > >> 1poorHlth 0goodHlth > >> 101 575 > 571/(571+101)[1] 0.85 > >> > >> > >> > >> R Code Chunk > >> > >> # set the seed for random number generator for replication > >> set.seed(47306) > >> # have the 7/3 split with 70% of the cases allotted to the training set > >> # AND create the training set identifier > >> class.train = sample(1:nrow(usedta), nrow(usedta)*0.7) > >> # create the test set indicator > >> class.test = (-class.train) > >> # create a vector for the binary response variable from the test set > >> # for future cross-tabulation. > >> h2.test <- usedta$h2[class.test] > >> # count the train set cases > >> Ntrain = length(usedta$h2[class.train]) > >> # run the classification tree model using the training set > >> # h2 is the binary response and other variables are predictors > >> tree.h2 <- tree(h2 ~ age + educ + female + white + married + happy, > >> data = usedta, subset = class.train, > >> control = tree.control(nobs=Ntrain, mindev=0.003)) > >> # summary results > >> summary(tree.h2) > >> # make predictions of h2 using the test set > >> tree.h2.pred <- predict(tree.h2, usedta[class.test,], type="class") > >> # cross tab the predictions using the test set > >> table(tree.h2.pred, h2.test) > >> tab = table(tree.h2.pred, h2.test) > >> # calculate the ratio for the correctly predicted in the test set > >> (tab[1,1] + tab[2,2]) / sum(tab) > >> # calculate the ratio for the correctly predicted using the naive > approach > >> # by betting on the majority category. > >> table(h2.test)[2]/sum(tab) > >> > >> [[alternative HTML version deleted]] > >> > >> __ > >>
Re: [R] [EXT] Re: Plot math symbol with string and number
Thanks David for a quite interesting suggestion: Indeed I didn't know paste0! Best! On Mon, Aug 17, 2020 at 12:26 PM David K Stevens wrote: > John - one more approach > > plot(y,main=parse(text=paste0('data ~~ sigma == ',s))) > > Good luck > > David Stevens > > On 8/17/2020 8:23 AM, John Smith wrote: > > Thanks to Dunkan, Rasmus and Bert. Will keep the very useful tips. Best! > > > > On Sun, Aug 16, 2020 at 9:13 PM Rasmus Liland wrote: > > > >> On Sun, Aug 16, 2020 at 3:18 PM Bert wrote: > >> | On Sun, Aug 16, 2020, 14:53 John wrote: > >> | | > >> | | I would like to make plots with > >> | | titles for different data sets and > >> | | different parameters. The first > >> | | title doesn't show sigma as a math > >> | | symbol, while the second one > >> | | doesn't contain the s value as a > >> | | numeric value > >> | | > >> | | s <- 1 > >> | | y <- rnorm(100) > >> | | plot(y, main=paste("data", "sigma=", s)) > >> | | plot(y, main=expression(paste("data", sigma,"=", s))) > >> | > >> | ?plotmath > >> > >> Dear John, read ?plotmath, it is good, I > >> was not aware of its existence; then > >> backquote s like so: > >> > >> plot(y, main=bquote(paste( > >>"data" ~~ widehat(aleph) > >>%notin% .(s)^.(s > >> > >> V > >> > >> r > >> > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > CAUTION: This email originated from outside of USU. If this appears to > be a USU employee, beware of impersonators. Do not click links, reply, > download images, or open attachments unless you verify the sender’s > identity and know the content is safe. > > > -- > David K Stevens, P.E., Ph.D. > Professor, Environmental Engineering > Civil and Environmental Engineering > Utah Water Research Laboratory > 8200 Old Main Hill > Logan, UT 84322-8200 > 435 797 3229 - voice > 435 797 1363 - fax > david.stev...@usu.edu > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot math symbol with string and number
Thanks to Dunkan, Rasmus and Bert. Will keep the very useful tips. Best! On Sun, Aug 16, 2020 at 9:13 PM Rasmus Liland wrote: > On Sun, Aug 16, 2020 at 3:18 PM Bert wrote: > | On Sun, Aug 16, 2020, 14:53 John wrote: > | | > | | I would like to make plots with > | | titles for different data sets and > | | different parameters. The first > | | title doesn't show sigma as a math > | | symbol, while the second one > | | doesn't contain the s value as a > | | numeric value > | | > | | s <- 1 > | | y <- rnorm(100) > | | plot(y, main=paste("data", "sigma=", s)) > | | plot(y, main=expression(paste("data", sigma,"=", s))) > | > | ?plotmath > > Dear John, read ?plotmath, it is good, I > was not aware of its existence; then > backquote s like so: > > plot(y, main=bquote(paste( > "data" ~~ widehat(aleph) > %notin% .(s)^.(s > > V > > r > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plot math symbol with string and number
Dear Helpers, I would like to make plots with titles for different data sets and different parameters. So a useful title should combine data name and parameter for clarity. The following is a simplified code example with two plots. The first title doesn't show sigma as a math symbol, while the second one doesn't contain the s value as a numeric value - I could manually change the s value, but when there are many different s values, this is not a good solution. Thanks! s <- 1 y <- rnorm(100) plot(y, main=paste("data", "sigma=", s)) plot(y, main=expression(paste("data", sigma,"=", s))) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] offset in glm
It is simple to use the provided function glm as fit1 below. However, without the offset argument, I tried fit2 below. The reason I used fit2 is that (for X as predictors, b the coefficients) fit2: log(Claims/Holders) = Xb means fit1: log(Claims)=Xb + log(Holders) Obviously the results from fit2 are different from fit1. Thanks! ## library("MASS") ## main-effects fit as Poisson GLM with offset fit1 <- glm(Claims ~ District + Group + Age + offset(log(Holders)), data = Insurance, family = poisson) coef(fit1) > coef(fit1) (Intercept) District2 District3 District4 Group.L -1.8105078329 0.0258681909 0.0385239271 0.2342053280 0.4297075387 Group.Q Group.C Age.L Age.Q Age.C 0.0046324351 -0.0292943222 -0.3944318082 -0.0003549709 -0.0167367565 fit2 <- glm(Claims/Holders ~ District + Group + Age, data = Insurance, family = poisson) > coef(fit2) (Intercept) District2 District3 District4 Group.L Group.Q -1.86340418 0.17552458 0.11081521 0.15131076 0.43701544 -0.01530721 Group.C Age.L Age.Q Age.C -0.06033747 -0.31976743 -0.01833841 -0.01694737 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Running Omega in R
Hi David Thank you for your feedback I didn’t realise you couldn’t implement omega on a single factor This definitely makes sense the more I read on it Thank you all very much for your time From: David L Carlson Sent: Monday 16 January 2017 16:43 To: peter dalgaard; Rui Barradas Cc: John Smith; r-help@r-project.org; reve...@northwestern.edu Subject: RE: [R] Running Omega in R I get an additional error message when I run your example so perhaps your version is not the current (1.6.12) one? Loading required namespace: GPArotation Failed with error: ‘there is no package called ‘GPArotation’’ Error in omegah(m = m, nfactors = nfactors, fm = fm, key = key, flip = flip, : I am sorry, you need to have the GPArotation package installed Then I get a great deal of error output followed by the one you list. The first error is "Omega_h for 1 factor is not meaningful, just omega_t" (repeated 174 times). I'm copying the package maintainer as this seems like overkill. Your problem is addressed in the first line of the manual page for omega: "McDonald has proposed coefficient omega as an estimate of the general factor saturation of a test. One way to find omega is to do a factor analysis of the original data set, rotate the factors obliquely, do a Schmid Leiman transformation, and then find omega. " You cannot rotate a single factor so you cannot compute omega. - David L Carlson Department of Anthropology Texas A University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of peter dalgaard Sent: Sunday, January 15, 2017 2:27 PM To: Rui Barradas <ruipbarra...@sapo.pt> Cc: John Smith <jsmith.cours...@outlook.com>; r-help@r-project.org Subject: Re: [R] Running Omega in R > On 15 Jan 2017, at 20:57 , Rui Barradas <ruipbarra...@sapo.pt> wrote: > > Hello, > > I no nothing about package psych so if you ask whether this is the wrong list > you can always try > > maintainer("psych") > [1] "William Revelle <reve...@northwestern.edu>" ...although publishing a package is not the same as issuing a blank cheque for free support, so you may want to dig a little deeper first. I'd check the documentation for whether the data are in the right format. In particular, can "m" be a data frame? If it wants a matrix, you probably need to give it one (m=as.matrix(construct) should do it). -pd > > Hope this helps, > > Rui Barradas > > Em 15-01-2017 12:32, John Smith escreveu: >> Hi, >> >> I opened a question on stack overflow I’m hoping this mailing list can help >> with. >> I have a dataset below (this is made up but produces the same error I am >> getting) >> >> structure(list(Q1 = c(4, 5, 3, 5, 4, 5, 3, 5, 5, 5, 6, >> 3, 5, 4, 6, 5, 5, 6, 7, 4, 5, 5, 3, 4, 4, 5, 4, 3, 5, 4, 5, 5, >> 6, 6, 3, 6, 3, 4, 4, 4, 6, 5, 3, 2, 6, 6, 4, 5, 4, 3, 6, 4, 4, >> 5, 6, 2, 4, 3, 4, 6, 4, 6, 4, 5, 5, 6, 4, 6, 5, 5, 4, 5, 6, 6, >> 2, 5, 4, 3, 4, 4, 4, 6, 3, 3, 5, 4, 4, 4, 5, 5, 5, 3, 6, 6, 6, >> 6, 5, 4, 3, 5), Q2 = c(7, 4, 4, 4, 4, 6, 6, 6, 7, 6, 5, >> 6, 5, 4, 5, 6, 6, 6, 7, 5, 4, 4, 6, 6, 4, 4, 6, 2, 6, 5, 4, 6, >> 4, 6, 6, 6, 5, 4, 4, 4, 4, 3, 3, 4, 4, 4, 4, 6, 2, 6, 6, 5, 4, >> 6, 6, 4, 4, 7, 6, 5, 5, 5, 5, 6, 5, 5, 4, 5, 5, 5, 4, 6, 7, 5, >> 5, 5, 6, 5, 6, 5, 6, 7, 2, 6, 5, 7, 3, 5, 5, 3, 3, 3, 7, 4, 5, >> 6, 6, 6, 5, 7), Q3 = c(5, 4, 5, 6, 4, 4, 5, 4, 2, 6, 5, >> 5, 5, 5, 7, 5, 5, 6, 7, 6, 3, 6, 6, 6, 5, 6, 6, 5, 5, 4, 5, 5, >> 6, 6, 5, 6, 5, 5, 4, 4, 6, 4, 4, 4, 4, 4, 4, 5, 5, 4, 5, 5, 4, >> 3, 5, 4, 5, 6, 6, 6, 4, 5, 5, 5, 6, 4, 5, 5, 7, 4, 5, 6, 6, 5, >> 5, 3, 3, 5, 4, 6, 5, 5, 1, 3, 5, 3, 2, 5, 4, 6, 6, 6, 6, 4, 6, >> 3, 6, 6, 6, 5), Q4 = c(6, 6, 4, 7, 4, 6, 7, 6, 7, 6, 6, >> 6, 5, 7, 7, 6, 6, 5, 7, 7, 6, 6, 7, 7, 6, 6, 6, 5, 6, 7, 5, 6, >> 7, 5, 4, 6, 4, 3, 6, 4, 6, 6, 6, 3, 5, 7, 5, 6, 4, 6, 7, 6, 7, >> 4, 6, 3, 5, 7, 5, 4, 6, 6, 4, 6, 5, 5, 5, 5, 7, 7, 7, 6, 6, 6, >> 5, 6, 6, 4, 5, 7, 6, 7, 3, 5, 6, 5, 6, 5, 5, 7, 7, 6, 6, 2, 7, >> 6, 6, 7, 7, 5)), .Names = c("Q1", "Q2", "Q3", >> "Q4"), row.names = c(NA, 100L), class = "data.frame") >> >> >> When i run Cronbach Alpha with R i get a result >> >> psych::alpha(construct, >> na.rm = TRUE, >> title = 'myscale', >> n.iter = 1000) >> >> >> When i run Omega using R i get the following error message >> >> "Error in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : >> I am sorry: missing values (NAs) in the correlation matrix do not allow me >> to continue. >> Ple
[R] Running Omega in R
Hi, I opened a question on stack overflow I’m hoping this mailing list can help with. I have a dataset below (this is made up but produces the same error I am getting) structure(list(Q1 = c(4, 5, 3, 5, 4, 5, 3, 5, 5, 5, 6, 3, 5, 4, 6, 5, 5, 6, 7, 4, 5, 5, 3, 4, 4, 5, 4, 3, 5, 4, 5, 5, 6, 6, 3, 6, 3, 4, 4, 4, 6, 5, 3, 2, 6, 6, 4, 5, 4, 3, 6, 4, 4, 5, 6, 2, 4, 3, 4, 6, 4, 6, 4, 5, 5, 6, 4, 6, 5, 5, 4, 5, 6, 6, 2, 5, 4, 3, 4, 4, 4, 6, 3, 3, 5, 4, 4, 4, 5, 5, 5, 3, 6, 6, 6, 6, 5, 4, 3, 5), Q2 = c(7, 4, 4, 4, 4, 6, 6, 6, 7, 6, 5, 6, 5, 4, 5, 6, 6, 6, 7, 5, 4, 4, 6, 6, 4, 4, 6, 2, 6, 5, 4, 6, 4, 6, 6, 6, 5, 4, 4, 4, 4, 3, 3, 4, 4, 4, 4, 6, 2, 6, 6, 5, 4, 6, 6, 4, 4, 7, 6, 5, 5, 5, 5, 6, 5, 5, 4, 5, 5, 5, 4, 6, 7, 5, 5, 5, 6, 5, 6, 5, 6, 7, 2, 6, 5, 7, 3, 5, 5, 3, 3, 3, 7, 4, 5, 6, 6, 6, 5, 7), Q3 = c(5, 4, 5, 6, 4, 4, 5, 4, 2, 6, 5, 5, 5, 5, 7, 5, 5, 6, 7, 6, 3, 6, 6, 6, 5, 6, 6, 5, 5, 4, 5, 5, 6, 6, 5, 6, 5, 5, 4, 4, 6, 4, 4, 4, 4, 4, 4, 5, 5, 4, 5, 5, 4, 3, 5, 4, 5, 6, 6, 6, 4, 5, 5, 5, 6, 4, 5, 5, 7, 4, 5, 6, 6, 5, 5, 3, 3, 5, 4, 6, 5, 5, 1, 3, 5, 3, 2, 5, 4, 6, 6, 6, 6, 4, 6, 3, 6, 6, 6, 5), Q4 = c(6, 6, 4, 7, 4, 6, 7, 6, 7, 6, 6, 6, 5, 7, 7, 6, 6, 5, 7, 7, 6, 6, 7, 7, 6, 6, 6, 5, 6, 7, 5, 6, 7, 5, 4, 6, 4, 3, 6, 4, 6, 6, 6, 3, 5, 7, 5, 6, 4, 6, 7, 6, 7, 4, 6, 3, 5, 7, 5, 4, 6, 6, 4, 6, 5, 5, 5, 5, 7, 7, 7, 6, 6, 6, 5, 6, 6, 4, 5, 7, 6, 7, 3, 5, 6, 5, 6, 5, 5, 7, 7, 6, 6, 2, 7, 6, 6, 7, 7, 5)), .Names = c("Q1", "Q2", "Q3", "Q4"), row.names = c(NA, 100L), class = "data.frame") When i run Cronbach Alpha with R i get a result psych::alpha(construct, na.rm = TRUE, title = 'myscale', n.iter = 1000) When i run Omega using R i get the following error message "Error in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : I am sorry: missing values (NAs) in the correlation matrix do not allow me to continue. Please drop those variables and try again. In addition: There were 50 or more warnings (use warnings() to see the first 50)" psych::omega(m = construct, nfactors = 1, fm = "pa", n.iter = 1000, p = 0.05, title = "Omega", plot = FALSE, n.obs = 100) Stackoverflow Link: http://stackoverflow.com/questions/41533231/running-omega-with-psych-library-in-r?noredirect=1#comment70278453_41533231 If I have the wrong mailing list, could you direct me to the appropriate one (I’m aware it might not be an R issue but more of a stats question) Thank you for your time [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating Volatile Table in Teradata using RODBC
Hi All I'm trying to use R to create a temporary table in Teradata and then add rows from data frame into the temporary volatile table in R Based on the code below (I have changed the SQL slightly), I am able to create the temporary table in my spool space but when I try add the data frame mydata to it, R tells me it cannot find the table My assumption here is that it is looking for the real table as opposed to the temp table in spool space. Can anyone see the problem? # RShowDoc(teradataR, package=teradataR) - Manual #RShowDoc(RODBC, package=RODBC) library(RODBC) library(teradataR) # Import Data From Text File and remove duplicates mydata = read.table(c:/Users/user/Desktop/ Keys.txt) mydata.unique = unique(mydata) # Create SQL for Temp Table in Teradata strSQL.TempTable = CREATE VOLATILE TABLE TEMP_Keys (keys VARCHAR (39)) UNIQUE PRIMARY INDEX(key) ON COMMIT PRESERVE ROWS; # Connect To Teradata DB channel - odbcConnect('DB') # Execute Temp Table sqlQuery(channel, paste(strSQL.TempTable)) sqlUpdate(channel, mydata, tablename = TEMP_Keys, fast = TRUE) Any Help would be greatly appreciated KEYWORDS: Volatile, SQL, R, RODBC, teradataR REF: http://stackoverflow.com/questions/24740751/error-with-creating-volatile-tab le-in-teradata [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to change annotations in contour function from rsm package
Dear All, I am using R3.0.3 and rsm 2.04. Since I want do log transformation on my data, so I want change annotation of Slice at ... For example, I want change (Slice at W = 1.25, L=2) in the first graph into (Slice at W = log(1.25), L=log(2)). Can anyone tell me how can I change it? Thanks library(rsm) heli.rsm - rsm(ave ~ block + SO(x1, x2, x3, x4), data = heli) windows(width=10.8/2*3, height=10.8) par(mfrow = c(2, 3)) contour(heli.rsm, ~ x1 + x2 + x3 + x4, image = TRUE) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to set pch in auto.key for splom
Dear All, I am using most current version of R (3.0.2) and lattice (0.20-24). But can not set appropriate pch type with the following code. Could anyone help me? splom(~iris[1:4], groups = iris$Species, pch=1:3, auto.key = list(space = 'top', columns = 3, pch=1:3)) Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] appending and merging data frames
I know that the basic approach to append or merge data frames is using the rbind and merge commands. However, if I understand things correctly, for both commands one needs to do quite some additional programming to get e.g. behavior as with the Stata append and morge commands or to achieve some things which I think users need quite frequently. E.g. for appending, the data frame must have identical column names. In order to rename columns or in order to add columns with missing values if necessary, additional programming is needed. For merging, all matches get combined, so it is not easily possible to check for 1:1 or 1:n matches or limit the join to such kind of matches, is it? Those are just examples, there are a number of additional details that would be useful to be able to control for merging/appending (maybe at the expense of restricting the operation to just data frames). So my question is: are there any packages or existing utility functions which would provide append and merge functionality at a slightly higher (user-friendly) level? Although I am quite a noob, I would be prepared to give it a try and program these myself, but I have the feeling that this must be so common that maybe it would mean re-inventing the wheel? GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] appending and merging data frames
Hi Gerrit, as I said in my original email, I already know both the merge and the rbind commands, but I think that many standard situations (I have given just a few) require rather clumsy ad-hoc programming. So I was wondering if there are any packages or existing code that would make it easier to handle the diverse append/merge tasks that tend to occur rather frequently. For at least some of these tasks, other stats packages, like Stata do provide out-of-the box commands or command options. Again, I just want to know if there is already some package or code for this out there somewhere, especially for appending tasks (automatically taking care of missing variables, renaming, data types etc.) The RSiteSearch() function is helpful and came up with slightly different versions of merge but nothing for append. John -Original Message- From: gerrit.eich...@math.uni-giessen.de Sent: Thu, 7 Feb 2013 16:57:13 +0100 (MET) To: johsmi9...@inbox.com Subject: Re: [R] appending and merging data frames Hello, John, as a start take a look at ?merge And to (maybe) get a bit overwhelmed at first sight use RSiteSearch( merge) Hth -- Gerrit On Thu, 7 Feb 2013, John Smith wrote: I know that the basic approach to append or merge data frames is using the rbind and merge commands. However, if I understand things correctly, for both commands one needs to do quite some additional programming to get e.g. behavior as with the Stata append and morge commands or to achieve some things which I think users need quite frequently. E.g. for appending, the data frame must have identical column names. In order to rename columns or in order to add columns with missing values if necessary, additional programming is needed. For merging, all matches get combined, so it is not easily possible to check for 1:1 or 1:n matches or limit the join to such kind of matches, is it? Those are just examples, there are a number of additional details that would be useful to be able to control for merging/appending (maybe at the expense of restricting the operation to just data frames). So my question is: are there any packages or existing utility functions which would provide append and merge functionality at a slightly higher (user-friendly) level? Although I am quite a noob, I would be prepared to give it a try and program these myself, but I have the feeling that this must be so common that maybe it would mean re-inventing the wheel? GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talkÿÿ and most webmails __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing operation based upon a sequence
I think this is because some decimal numbers with a fixed number of digits after the decimal point cannot be represented as base-2 floating point numbers with a limited number of digits. So when one adds them up, the result is slightly different from the sum: huh1 = seq(.1,.7,.15) huh1 - 0.55 [1] -4.50e-01 -3.00e-01 -1.50e-01 -1.110223e-16 1.50e-01 for == to give TRUE, the numbers have to be exactly identical though. R shows those numbers rounded to two places, so the difference is not immediately visible. -Original Message- From: j.brae...@uvt.nl Sent: Thu, 7 Feb 2013 16:41:49 + To: r-help@r-project.org Subject: [R] indexing operation based upon a sequence Dear R-list, We stumbled upon some weird problem when performing a simple indexing operation. Below some example code to illustrate the issue #FAILS TO FIND .55 Oo huh1 = seq(.1,.7,.15);huh1 # [1] 0.10 0.25 0.40 0.55 0.70 huh1 == .25 # [1] FALSE TRUE FALSE FALSE FALSE huh1 == .55 # [1] FALSE FALSE FALSE FALSE FALSE huh1 == 0.55 # [1] FALSE FALSE FALSE TRUE FALSE Hence somehow one element in the sequence has become a character? #DOES NOT FAIL TO FIND .55 huh2 = c(.1,.25,.4,.55,.7);huh3 # [1] 0.10 0.25 0.40 0.55 0.70 huh2 == .25 # FALSE TRUE FALSE FALSE FALSE huh2 == .55 # [1] FALSE FALSE FALSE TRUE FALSE This is what you would expect. #DOES NOT FAIL TO FIND .55 huh3 = seq(.1,.7,.05);huh2 # [1] 0.10 0.25 0.40 0.55 0.70 huh3 == .25 # [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE huh3 == .55 # [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE Strangely when making a longer sequence, the weird behavior does not persist. Does anyone have a clue what is going on here? _ platform i386-pc-mingw32 version.string R version 2.12.1 (2010-12-16) Best regards, Johan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] appending and merging data frames
I gave a small number of examples in my first email. For append, in Stata any variable that is missing in the other dataset is automatically inserted with all values set to the missing value. In R, one would first have to compare the columns in both data frames and generate columns apropriately. For merge, when merging dataset A with dataset B on some set of key variables, it is possible to specify if the join should be done 1:1, 1:n, n:1 or n:m. Stata creates a data structure that contains information about which observations could or could not be matched and which matches did not fit the 1:1/1:n or n:1 join pattern. Again, I know that in R it is not hard to program all that yourself (while programming in Stata is something different altogether), I just wondered if for these operations, which probably are needed quite often, there are some additional packages or existing code that already deals with most of the situations one encounters when appending/merging. The background is that I am making the move from other stats packages (including Stata) to R myself and I also want to motivate others to do it. Also, before I go about implementing my own utility functions for this, I wanted to make sure that something much better isn't already out there (it usually is, especially when one is a beginner). -Original Message- From: dwinsem...@comcast.net Sent: Thu, 7 Feb 2013 09:46:02 -0800 To: johsmi9...@inbox.com Subject: Re: [R] appending and merging data frames On Feb 7, 2013, at 9:12 AM, John Smith wrote: Hi Gerrit, as I said in my original email, I already know both the merge and the rbind commands, but I think that many standard situations (I have given just a few) require rather clumsy ad-hoc programming. So I was wondering if there are any packages or existing code that would make it easier to handle the diverse append/merge tasks that tend to occur rather frequently. For at least some of these tasks, other stats packages, like Stata do provide out-of-the box commands or command options. Again, I just want to know if there is already some package or code for this out there somewhere, especially for appending tasks (automatically taking care of missing variables, renaming, data types etc.) The RSiteSearch() function is helpful and came up with slightly different versions of merge but nothing for append. The sqldf package provides an interface to popular database drivers. I do not think your question provides enough specificity to go much further. You say Stata provides ... something... but you do not really explain what that something is. My efforts to understand the Stata documentation for the egen command left me shaking my head in disbelief at its opacity, and caused me to appreciate further the efforts of the R developers to make our help system available. The most productive approach would be to present a simple example in R code. -- David. John -Original Message- From: gerrit.eich...@math.uni-giessen.de Sent: Thu, 7 Feb 2013 16:57:13 +0100 (MET) To: johsmi9...@inbox.com Subject: Re: [R] appending and merging data frames Hello, John, as a start take a look at ?merge And to (maybe) get a bit overwhelmed at first sight use RSiteSearch( merge) Hth -- Gerrit On Thu, 7 Feb 2013, John Smith wrote: I know that the basic approach to append or merge data frames is using the rbind and merge commands. However, if I understand things correctly, for both commands one needs to do quite some additional programming to get e.g. behavior as with the Stata append and morge commands or to achieve some things which I think users need quite frequently. E.g. for appending, the data frame must have identical column names. In order to rename columns or in order to add columns with missing values if necessary, additional programming is needed. For merging, all matches get combined, so it is not easily possible to check for 1:1 or 1:n matches or limit the join to such kind of matches, is it? Those are just examples, there are a number of additional details that would be useful to be able to control for merging/appending (maybe at the expense of restricting the operation to just data frames). So my question is: are there any packages or existing utility functions which would provide append and merge functionality at a slightly higher (user-friendly) level? Although I am quite a noob, I would be prepared to give it a try and program these myself, but I have the feeling that this must be so common that maybe it would mean re-inventing the wheel? GET FREE SMILEYS FOR YOUR IM EMAIL - Learn more at http://www.inbox.com/smileys Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talkÿÿ and most webmails __ R-help@r-project.org mailing list https
Re: [R] appending and merging data frames
One rather extremem limitation in Stata is that there is always only just one active or loaded dataset. So all the commands that involve more than one dataset operate on the loaded master dataset and one or more other datasets that are stored somewhere. Merge in Stata joins the loaded master dataset with another one on the harddisk. There are many options that influence the behavior, but roughly this is what can be done: When one specifies one of the join options 1:1, 1:n or n:1, an error is shown if the key variables do not uniquely identify observations at the 1 side. (so with regard to your question it is (b)) After the merge, a column is added to the resulting dataset that indicates if the observation was matched, if it occurred in the master dataset only or if it occurred in the other dataset only. Additional options allow to restrict what will be kept in the merge, e.g. only observations that occur in the master or in both and in the same way one can restrict what pairings are allowed without raising an error. Another set of options allows to specify how variables that occur in both datasets but are not key variables are updated: e.g. always from master, always from the other dataset or the master gets updated only if the current value is missing. A merge can also be done purely on observation number (no key variables). Additional options control how value labels (how factors are represented as readable strings) and variable notes are merged. -Original Message- From: wdun...@tibco.com Sent: Thu, 7 Feb 2013 18:18:26 + To: johsmi9...@inbox.com, dwinsem...@comcast.net Subject: RE: [R] appending and merging data frames For merge, when merging dataset A with dataset B on some set of key variables, it is possible to specify if the join should be done 1:1, 1:n, n:1 or n:m. Stata creates a data structure that contains information about which observations could or could not be matched and which matches did not fit the 1:1/1:n or n:1 join pattern. I don't know Stata and am curious about the above. If you ask for a 1:n join but your key column[s] in your first input have duplicates, what does Stata do? Does it (a) use the first of the duplicates to produce an answer and also return an object describing the problem (b) refuse to do the merge and return an object describing the problem or something else? How is the data structure containing information about problems in the merge connected to the output of merge? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of John Smith Sent: Thursday, February 07, 2013 9:59 AM To: David Winsemius Cc: r-help@r-project.org Subject: Re: [R] appending and merging data frames I gave a small number of examples in my first email. For append, in Stata any variable that is missing in the other dataset is automatically inserted with all values set to the missing value. In R, one would first have to compare the columns in both data frames and generate columns apropriately. For merge, when merging dataset A with dataset B on some set of key variables, it is possible to specify if the join should be done 1:1, 1:n, n:1 or n:m. Stata creates a data structure that contains information about which observations could or could not be matched and which matches did not fit the 1:1/1:n or n:1 join pattern. Again, I know that in R it is not hard to program all that yourself (while programming in Stata is something different altogether), I just wondered if for these operations, which probably are needed quite often, there are some additional packages or existing code that already deals with most of the situations one encounters when appending/merging. The background is that I am making the move from other stats packages (including Stata) to R myself and I also want to motivate others to do it. Also, before I go about implementing my own utility functions for this, I wanted to make sure that something much better isn't already out there (it usually is, especially when one is a beginner). -Original Message- From: dwinsem...@comcast.net Sent: Thu, 7 Feb 2013 09:46:02 -0800 To: johsmi9...@inbox.com Subject: Re: [R] appending and merging data frames On Feb 7, 2013, at 9:12 AM, John Smith wrote: Hi Gerrit, as I said in my original email, I already know both the merge and the rbind commands, but I think that many standard situations (I have given just a few) require rather clumsy ad-hoc programming. So I was wondering if there are any packages or existing code that would make it easier to handle the diverse append/merge tasks that tend to occur rather frequently. For at least some of these tasks, other stats packages, like Stata do provide out-of-the box commands or command options. Again, I just want to know if there is already some
[R] use xyplot to plot mean and CI by groups
Dear R users, I am trying to use xyplot to draw group mean and CI. The following is the sample code. But I want: 1. Use different colors and symbols to draw individual points, CI and the lines connect group means from different time points; 2. Add jitters to x axis to allow CIs not be overlapped Could anyone modify the attached code to achieve this? Thanks library(lattice) set.seed(123) src - data.frame(time = rep(c(1, 6, 24), rep(6,3)), trt = rep(rep(c('C','T'), each=3), 3)) src - transform(src, x=sqrt(time)+2*(trt=='C')+rnorm(18), trt = ordered(trt, levels=c('C','T'))) src - src[-5,] src$m - ave(src$x, src[c('time','trt')], FUN = mean) src$sd - ave(src$x, src[c('time','trt')], FUN = sd) src$n - ave(src$x, src[c('time','trt')], FUN = length) src - transform(src, se = sd/sqrt(n)) src - transform(src, cl=m-se, cu=m+se) prepanel.ci - function(x, y, ly, uy, subscripts, ...) { x - as.numeric(x) ly - as.numeric(ly[subscripts]) uy - as.numeric(uy[subscripts]) list(ylim = range(y, uy, ly, finite = TRUE)) } panel.ci - function(x, y, ly, my, uy, subscripts, ...) { x - as.numeric(x);y - as.numeric(y); my - as.numeric(my[subscripts]) ly - as.numeric(ly[subscripts]); uy - as.numeric(uy[subscripts]) panel.xyplot(x, y, ...) panel.arrows(x, ly, x, uy, length = 0.25, unit = native, angle = 90, code = 3) panel.lines(x, my) } par.settings - list(superpose.symbol = list(col = c(red, green, blue), fill = c(red, green, blue), pch=15:17), superpose.line = list(col = c(red, green, blue))) windows(width=10.67, height=6.60) xyplot(x ~ time, xlab=Time, ylab=X, groups=trt, data=src, trt=src$trt, ly = src$cl, uy = src$cu, mx = src$time, my = src$m, type=p, prepanel = prepanel.ci, panel = panel.superpose, panel.groups = panel.ci, auto.key = list(space = top, points = TRUE, lines = TRUE, columns=2), par.settings = par.settings) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Performing t tests between matrices
I need to perform 1 T tests #I have two populations with different means Popc1-rnorm(10,10,2) Popc2-rnorm(10,8,2) #I created two sets of samples - each set has 1 samples, and I made a matrix of 20 rows and 1 columns to fit the data sampc1-matrix(,20,1) for(j in 1:1){sampc1[1:20,j]-sample(Popc1,20)} sampc2-matrix(,20,1) for(j in 1:1){sampc2[1:20,j]-sample(Popc2,20)} #I wrote a blank matrix with 1 row and 1 columns. I am planning to fill this matrix with the t values that i calculate for each pair of samples t-matrix(,1,1) #What i need to do is to calculate the t value for each pair of samples: So if i were to do it manually, for the first sample in each set, I would do something like: t.test(sampc1[,1],sampc2[,1]) #And i would continue doing this all the way until the 1th pair of samples so t.test(sampc1[,2],sampc2[,2]) t.test(sampc1[,3],sampc2[,3]) and so on #But i need to do this for all 1 pairs of samples: #I tried to write a loop for(j in 1:1) {t[,1:1]-t.test(sampc1[, j],sampc2[, j])$statistic} #but this fills each column in the matrix with the same exact t value. #Can someone tell me how to write the code to perform each pair of t tests? I think i am missing something in my loop Thanks -- View this message in context: http://r.789695.n4.nabble.com/Performing-t-tests-between-matrices-tp4571706p4571706.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Performing repeated T tests in R
I need to perform 1 T tests #I have two populations with different means Popc1-rnorm(10,10,2) Popc2-rnorm(10,8,2) #I created two sets of samples - each set has 1 samples, and I made a matrix of 20 rows and 1 columns to fit the data sampc1-matrix(,20,1) for(j in 1:1){sampc1[1:20,j]-sample(Popc1,20)} sampc2-matrix(,20,1) for(j in 1:1){sampc2[1:20,j]-sample(Popc2,20)} #I wrote a blank matrix with 1 row and 1 columns. I am planning to fill this matrix with the t values that i calculate for each pair of samples t-matrix(,1,1) #What i need to do is to calculate the t value for each pair of samples: So if i were to do it manually, for the first sample in each set, I would do something like: t.test(sampc1[,1],sampc2[,1]) #And i would continue doing this all the way until the 1th pair of samples so t.test(sampc1[,2],sampc2[,2]) t.test(sampc1[,3],sampc2[,3]) and so on #But i need to do this for all 1 pairs of samples: #I tried to write a loop for(j in 1:1) {t[,1:1]-t.test(sampc1[, j],sampc2[, j])$statistic} #but this fills each column in the matrix with the same exact t value. #Can someone tell me how to write the code to perform each pair of t tests? I think i am missing something in my loop Thanks -- View this message in context: http://r.789695.n4.nabble.com/Performing-repeated-T-tests-in-R-tp4571860p4571860.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can not use plot.Predict {rms} reproduce figure 7.8 from Regression Modeling Strategies (http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/course2.pdf)
Sorry. I am still using the 9-11 March 2011 version of course2.pdf. On Wed, Mar 14, 2012 at 5:52 PM, David Winsemius dwinsem...@comcast.netwrote: On Mar 14, 2012, at 4:09 PM, John Smith wrote: With most current version of R and RMS, the 4 curves are drew in 4 separate panels. Can anyone show me how can I get the figure exactly like the figure 7.8 in *Regression Modeling Strategies* ( http://biostat.mc.vanderbilt.**edu/wiki/pub/Main/RmS/course2.**pdfhttp://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/course2.pdf ) In case anyone else is scratching their head wondering what connection there might be between figure 7.8 in that document (which happens to be a nomogram), they should realize that the questioner is most probably asking about 7.8 in the RMS *book* and that there is no such Figure in chapter 7 of the pdf document . (Nor, for that matter, does a search for democrat bring up any hits in the document so I don't think the code comes from there either.) I suspect the data is here: http://biostat.mc.vanderbilt.**edu/wiki/pub/Main/DataSets/**counties.savhttp://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/counties.sav -- David. On Tue, May 17, 2011 at 4:04 PM, John Smith zmr...@gmail.com wrote: Dear R-users, I am using R 2.13.0 and rms 3.3-0 , but can not reproduce figure 7.8 of the handouts *Regression Modeling Strategies* ( http://biostat.mc.vanderbilt.**edu/wiki/pub/Main/RmS/course2.**pdfhttp://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/course2.pdf) by the following code. Could any one help me figure out how to solve this? setwd('C:/Rharrell') require(rms) load('data/counties.sav') older - counties$age6574 + counties$age75 label(older) - '% age = 65, 1990' pdensity - logb(counties$pop.density+1, 10) label(pdensity) - 'log 10 of 1992 pop per 1990 miles^2' counties - cbind(counties, older, pdensity) # add 2 vars. not in data frame dd - datadist(counties) options(datadist='dd') f - ols(democrat ~ rcs(pdensity,4) + rcs(pop.change,3) + rcs(older,3) + crime + rcs(college,5) + rcs(income,4) + rcs(college,5) %ia% rcs(income,4) + rcs(farm,3) + rcs(white,5) + rcs(turnout,3), data=counties) incomes - seq(22900, 32800, length=4) show.pts - function(college.pts, income.pt) { s - abs(income - income.pt) 1650 # Compute 10th smallest and 10th largest % college # educated in counties with median family income within # $1650 of the target income x - college[s] x - sort(x[!is.na(x)]) n - length(x) low - x[10]; high - x[n-9] college.pts = low college.pts = high } windows() plot(Predict(f, college, income=incomes, conf.int=FALSE), xlim=c(0,35), ylim=c(30,55), lty=1, lwd=c(.25,1.5,3.5,6), col=c(1,1,2,2), perim=show.pts) David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can not use plot.Predict {rms} reproduce figure 7.8 from Regression Modeling Strategies (http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/course2.pdf)
With most current version of R and RMS, the 4 curves are drew in 4 separate panels. Can anyone show me how can I get the figure exactly like the figure 7.8 in *Regression Modeling Strategies* ( http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/course2.pdf) Thanks On Tue, May 17, 2011 at 4:04 PM, John Smith zmr...@gmail.com wrote: Dear R-users, I am using R 2.13.0 and rms 3.3-0 , but can not reproduce figure 7.8 of the handouts *Regression Modeling Strategies* ( http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/course2.pdf) by the following code. Could any one help me figure out how to solve this? setwd('C:/Rharrell') require(rms) load('data/counties.sav') older - counties$age6574 + counties$age75 label(older) - '% age = 65, 1990' pdensity - logb(counties$pop.density+1, 10) label(pdensity) - 'log 10 of 1992 pop per 1990 miles^2' counties - cbind(counties, older, pdensity) # add 2 vars. not in data frame dd - datadist(counties) options(datadist='dd') f - ols(democrat ~ rcs(pdensity,4) + rcs(pop.change,3) + rcs(older,3) + crime + rcs(college,5) + rcs(income,4) + rcs(college,5) %ia% rcs(income,4) + rcs(farm,3) + rcs(white,5) + rcs(turnout,3), data=counties) incomes - seq(22900, 32800, length=4) show.pts - function(college.pts, income.pt) { s - abs(income - income.pt) 1650 # Compute 10th smallest and 10th largest % college # educated in counties with median family income within # $1650 of the target income x - college[s] x - sort(x[!is.na(x)]) n - length(x) low - x[10]; high - x[n-9] college.pts = low college.pts = high } windows() plot(Predict(f, college, income=incomes, conf.int=FALSE), xlim=c(0,35), ylim=c(30,55), lty=1, lwd=c(.25,1.5,3.5,6), col=c(1,1,2,2), perim=show.pts) Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] general question on Spotfire
After one month of struggle, I lost confidence on Spotfire, it's totally not user-friendly. Yes it is very good for certain data presentation, but you still need do heavy lifting on data processing by other software. For me, R is the perfect tool and has best user supporting group. I send this email just for the statisticians like me, I don't think Spotfire can improve your efficiency. On Sun, Jan 15, 2012 at 5:22 PM, Frank Harrell f.harr...@vanderbilt.eduwrote: Thanks for your note Uwe. Yes I think a lot of the needed work was related to implementing R functions that many of us use that are not available in S-Plus, plus what to do about plotmath. It wasn't enough to just be able to load the R package. I don't think implementation of the needed R functions in S-Plus ever happened. Frank Uwe Ligges-3 wrote On 12.01.2012 17:38, Frank Harrell wrote: As a slight aside, Tibco/Spotfire originally planned to provide a capability to load R packages into S-Plus. This always seemed to me to be a hard thing to do, and if my understanding is correct, this proved to be too difficult to do in S-Plus, at least for large packages such as mine. Frank, this dates back to the times of Insightful. They already did that and had a module that allowed to load R packages. Of course, they had to be S-PLUS compatible (which is not really easy any more if package authors made use of R functionality that exceed the features of S-PLUS). They even had hired people to make some R packages S-PLUS compatible, R2WinBUGS was just one example that was available from their CSAN repositories. (Nowadays R2WInBUGS is no longer S-PLUS compatible, since the authors do not care too much and do not have S-PLUS licenses to check it.) Best, Uwe Frank Terry Therneau-2 wrote John, Spotfire is a menu driven data exploration tool, very popular here with biologists who found that their previous Excel based approach doesn't cut it for large data sets. When TIBCO wanted to expand the tool with further quantitative features they made (I think) a bright decision to purchase S-plus and integrate it as a back end, rather than try to write dozens of new modules in house. The Splus vs R aspect of the list responses misses the main point, however. Spotfire is designed to let you nose around in a data set, quickly plotting various aspects, zoom in on subsets (imagine a mouse based version of the pinch metafor used on the iphone), etc. It is a useful and very well designed tool; one demo was enough to make the sale and early growth here was explosive. But if you already know R you can do those graphs already, albeit quite a bit slower. I decided not to persue proficiency in Spotfire, but that was partly because it's Windows based and I prefer Unix. Also most of my work is at the post-exploration phase, and I would have flipped back to straight R for that anyway. Terry Therneau __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/general-question-on-Spotfire-tp4285758p4289575.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/general-question-on-Spotfire-tp4285758p4297916.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] general question on Spotfire
Dear R users, I have been using R for 10 years, and I love it very much. But in my daily job for drug discovery, some people use Spotfire. I tried Spotfire on couple of data sets. It sounds I still need do some data manipulation before plot figures. For example, I can not plot figure with data arranged in rows (is this true, or I am stupid?). So far I don't feel any benefit Spotfire can provide over R. I am just wondering whether it just because I am new to Spotfire, or it's true that Spotfire is not a good tool for statistician. Also could anyone give me any suggestion how to learn Spotfire? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] general question on Spotfire
I am struggling whether I should learn Spotfire or not. I just want some statisticians inputs. Thanks On Wed, Jan 11, 2012 at 10:28 AM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 12-01-11 10:13 AM, John Smith wrote: Dear R users, I have been using R for 10 years, and I love it very much. But in my daily job for drug discovery, some people use Spotfire. I tried Spotfire on couple of data sets. It sounds I still need do some data manipulation before plot figures. For example, I can not plot figure with data arranged in rows (is this true, or I am stupid?). So far I don't feel any benefit Spotfire can provide over R. I am just wondering whether it just because I am new to Spotfire, or it's true that Spotfire is not a good tool for statistician. Also could anyone give me any suggestion how to learn Spotfire? Shouldn't you be asking this question to Spotfire users? Duncan Murdoch [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] can not use plot.Predict {rms} reproduce figure 7.8 from Regression Modeling Strategies (http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/course2.pdf)
Dear R-users, I am using R 2.13.0 and rms 3.3-0 , but can not reproduce figure 7.8 of the handouts *Regression Modeling Strategies* ( http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/course2.pdf) by the following code. Could any one help me figure out how to solve this? setwd('C:/Rharrell') require(rms) load('data/counties.sav') older - counties$age6574 + counties$age75 label(older) - '% age = 65, 1990' pdensity - logb(counties$pop.density+1, 10) label(pdensity) - 'log 10 of 1992 pop per 1990 miles^2' counties - cbind(counties, older, pdensity) # add 2 vars. not in data frame dd - datadist(counties) options(datadist='dd') f - ols(democrat ~ rcs(pdensity,4) + rcs(pop.change,3) + rcs(older,3) + crime + rcs(college,5) + rcs(income,4) + rcs(college,5) %ia% rcs(income,4) + rcs(farm,3) + rcs(white,5) + rcs(turnout,3), data=counties) incomes - seq(22900, 32800, length=4) show.pts - function(college.pts, income.pt) { s - abs(income - income.pt) 1650 # Compute 10th smallest and 10th largest % college # educated in counties with median family income within # $1650 of the target income x - college[s] x - sort(x[!is.na(x)]) n - length(x) low - x[10]; high - x[n-9] college.pts = low college.pts = high } windows() plot(Predict(f, college, income=incomes, conf.int=FALSE), xlim=c(0,35), ylim=c(30,55), lty=1, lwd=c(.25,1.5,3.5,6), col=c(1,1,2,2), perim=show.pts) Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] can not use plot.Predict {rms} reproduce figure 7.8 from Regression Modeling Strategies (http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/course2.pdf)
Also I can not reproduce figure 7.11 by f - Newlabels(f, list(turnout='voter turnout (%)')) windows() nomogram(f, interact=list(income=incomes), turnout=seq(30,100,by=10), lplabel='estimated % voting Democratic', cex.var=.8, cex.axis=.75) On Tue, May 17, 2011 at 4:04 PM, John Smith zmr...@gmail.com wrote: Dear R-users, I am using R 2.13.0 and rms 3.3-0 , but can not reproduce figure 7.8 of the handouts *Regression Modeling Strategies* ( http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/course2.pdf) by the following code. Could any one help me figure out how to solve this? setwd('C:/Rharrell') require(rms) load('data/counties.sav') older - counties$age6574 + counties$age75 label(older) - '% age = 65, 1990' pdensity - logb(counties$pop.density+1, 10) label(pdensity) - 'log 10 of 1992 pop per 1990 miles^2' counties - cbind(counties, older, pdensity) # add 2 vars. not in data frame dd - datadist(counties) options(datadist='dd') f - ols(democrat ~ rcs(pdensity,4) + rcs(pop.change,3) + rcs(older,3) + crime + rcs(college,5) + rcs(income,4) + rcs(college,5) %ia% rcs(income,4) + rcs(farm,3) + rcs(white,5) + rcs(turnout,3), data=counties) incomes - seq(22900, 32800, length=4) show.pts - function(college.pts, income.pt) { s - abs(income - income.pt) 1650 # Compute 10th smallest and 10th largest % college # educated in counties with median family income within # $1650 of the target income x - college[s] x - sort(x[!is.na(x)]) n - length(x) low - x[10]; high - x[n-9] college.pts = low college.pts = high } windows() plot(Predict(f, college, income=incomes, conf.int=FALSE), xlim=c(0,35), ylim=c(30,55), lty=1, lwd=c(.25,1.5,3.5,6), col=c(1,1,2,2), perim=show.pts) Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to delete empty levels from lattice xyplot
Hello All, I try to use the attached code to produce a cross over plot. There are 13 subjects, 7 of them in for/sal group, and 6 of them in sal/for group. But in xyplot, all the subjects are listed in both subgraphs. Could anyone help me figure out how to get rid of the empty levels? Thanks library(lattice) pef1 - c(310,310,370,410,250,380,330,370,310,380,290,260,90) pef2 - c(270,260,300,390,210,350,365,385,400,410,320,340,220) id - c(1,4,6,7,10,11,14,2,3,5,9,12,13) sequence - c(rep('for/sal', 7), rep('sal/for', 6)) treat1 - c(rep('for', 7), rep('sal', 6)) treat2 - c(rep('sal', 7), rep('for', 6)) study - data.frame(id, sequence, treat1, pef1, treat2, pef2) studyLong - as.data.frame(rbind(as.matrix(study[,c('id', 'sequence', 'treat1', 'pef1')]), as.matrix(study[,c('id', 'sequence', 'treat2', 'pef2')]))) colnames(studyLong) - c('id', 'sequence', 'treat', 'pef') xyplot(pef ~ id | sequence, groups=treat, data=studyLong, auto.key=list(columns=2)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lattice auto.key gives mismatch colors
Hello All, I am using the following code to draw a figure. But the legend given buy auto.key has mismatched colors. Could any one help me? I am using R2.12.1 and most current lattice on windows XP. Thanks John library(lattice) src - data.frame(t=rep(c('A','B','C','D'), rep(8,4)), s=rep(c(8132,8140,8178,8180,8224,8230,8337,8345), 4), v=c(55.10, 56.00, 206.00, 5.86, 164.00, 102.00, 171.00, 280.00, 236.00, 91.10, 238.00, 102.00, 59.30, 227.00, 280.00, 316.00, 205.00, 120.00, 273.00, 98.80, 167.00, 104.00, 155.00, 370.00, 215.00, 97.60, 133.00, 135.00, 48.60, 135.00, 77.10, 91.90)) colors - rgb(c(228, 55, 77, 152, 255, 255, 166, 247), c(26, 126, 175, 78, 127, 255, 86, 129), c(28, 184, 74, 163, 0, 51, 40, 191), maxColorValue=255) xyplot(v~t, groups=s, type='o', data=src, col=colors, auto.key = list(points=TRUE, columns = 4, col=colors)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need help in understanding R code, and maybe some math
Hi, I am trying to implement Higham's algorithm for correcting a non positive definite covariance matrix. I found this code in R: http://projects.cs.kent.ac.uk/projects/cxxr/trac/browser/trunk/src/library/Recommended/Matrix/R/nearPD.R?rev=637 I managed to understand most of it, the only line I really don't understand is this one: X - tcrossprod(Q * rep(d[p], each=nrow(Q)), Q) This line is supposed to calculate the matrix product Q*D*Q^T, Q is an n by m matrix and R is a diagonal n by n matrix. What does this mean? I also don't understand the meaning of a cross product between matrices, I only know it between vectors. Thanks, Barisdad. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bug in R2WinBUGS
In newest version of R2WinBUGS, the default directory is changed to working.directory, but never changed back once finished bugs call. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: How to draw curves from given formulae
I have another questions. How can I type specific names into strips of the resulting plot? For instance, in the resulting figure from the attached code, instead of 'umbrella(d)', I want have 'UMBRELLA' in the strip. library(lattice) flat - function(d) 0 * d linear - function(d) -(1.65/8) * d logistic - function(d) 0.015 - 1.73 / (1 + exp(1.2 * (4-d))) umbrella - function(d) -(1.65/3) * d + (1.65 / 36) * d^2 emax - function(d) -1.81 * d / (0.79 + d) sigmoid - function(d) -1.70 * d^5 / (4^5 + d^5) xyplot(flat(d) + linear(d) + logistic(d) + umbrella(d) + emax(d) + sigmoid(d) ~ d, data = list(d = seq(0, 8, length = 101)), ylab='Expected change from baseline in VAS at Week 6', xlab='Dose', type = l, outer = TRUE, ) Many thanks On Tue, Jul 22, 2008 at 4:31 PM, Deepayan Sarkar [EMAIL PROTECTED]wrote: On 7/22/08, John Smith [EMAIL PROTECTED] wrote: Thanks, Deepayan, I actually want to have a picture like the Figure 5.9 in Mixed-Effects Models in S and S-PLUS, which have a separate panel for each function and has the function name been printed in strip. xyplot(linear(d) + logistic(d) + umbrella(d) ~ d, data = list(d = seq(-5, 5, length = 101)), type = l, outer = TRUE) There are several ways to make the strip labels just the names; the simplest is to create a data frame with all the necessary columns beforehand. -Deepayan Thanks On Tue, Jul 22, 2008 at 1:08 PM, Deepayan Sarkar [EMAIL PROTECTED] wrote: On 7/21/08, John Smith [EMAIL PROTECTED] wrote: Dear R Users: I have a list function as: Flat: y = 0 Linear: y = -(1.65/8)d Logistic: y = 0.015 - 1.73/{1+exp[1.2(4-d)]} Umbrella: y= -(1.65/3)d + (1.65/36)d^2 Emax: y = -1.81d/(0.79+d) Sigmoid Emax: y = -1.70d^5/(4^5+d^5) And want draw the figure as attached (those material are extracted from a paper). Could anyone give me a sample code to do this? The attachment didn't come through, but here's one approach (with three of your functions): linear - function(d) -(1.65/8) * d logistic - function(d) 0.015 - 1.73 / (1 + exp(1.2 * (4-d))) umbrella - function(d) -(1.65/3) * d + (1.65 / 36) * d^2 xyplot(linear(d) + logistic(d) + umbrella(d) ~ d, data = list(d = seq(-5, 5, length = 101)), type = l, auto.key = list(lines = TRUE, points = FALSE)) -Deepayan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice: How to draw curves from given formulae
Dear R Users: I have a list function as: Flat: y = 0 Linear: y = -(1.65/8)d Logistic: y = 0.015 - 1.73/{1+exp[1.2(4-d)]} Umbrella: y= -(1.65/3)d + (1.65/36)d^2 Emax: y = -1.81d/(0.79+d) Sigmoid Emax: y = -1.70d^5/(4^5+d^5) And want draw the figure as attached (those material are extracted from a paper). Could anyone give me a sample code to do this? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: How to draw curves from given formulae
Thanks, Deepayan, I actually want to have a picture like the Figure 5.9 in Mixed-Effects Models in S and S-PLUS, which have a separate panel for each function and has the function name been printed in strip. Thanks On Tue, Jul 22, 2008 at 1:08 PM, Deepayan Sarkar [EMAIL PROTECTED] wrote: On 7/21/08, John Smith [EMAIL PROTECTED] wrote: Dear R Users: I have a list function as: Flat: y = 0 Linear: y = -(1.65/8)d Logistic: y = 0.015 - 1.73/{1+exp[1.2(4-d)]} Umbrella: y= -(1.65/3)d + (1.65/36)d^2 Emax: y = -1.81d/(0.79+d) Sigmoid Emax: y = -1.70d^5/(4^5+d^5) And want draw the figure as attached (those material are extracted from a paper). Could anyone give me a sample code to do this? The attachment didn't come through, but here's one approach (with three of your functions): linear - function(d) -(1.65/8) * d logistic - function(d) 0.015 - 1.73 / (1 + exp(1.2 * (4-d))) umbrella - function(d) -(1.65/3) * d + (1.65 / 36) * d^2 xyplot(linear(d) + logistic(d) + umbrella(d) ~ d, data = list(d = seq(-5, 5, length = 101)), type = l, auto.key = list(lines = TRUE, points = FALSE)) -Deepayan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to draw multiples curses (for given formulae) in lattice
Dear R Users, Could you please write a piece of code to draw Figure 5.9 of Mixed-Effects Models in S and S-Plus? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to plot an user-defined function
Thank all of you for your helps. They are very helpful. But I have a further question. Suppose I have the following mixed effect model thetaMixed - function(tau, i) { w - 1 / (s^2 + tau^2) mu - sum(theta * w) / sum(w) b - s[i]^2 / (s[i]^2 + tau^2) theta[i]*(1-b) + mu*b) } I want draw all the mixed effects in a single figure using for (i in 1:10) { plot(Vectorize(thetaMixed), 0, 2, i=i) } and hope plot will recognize that i is the argument for function thetaMixed, and 0, 2 are the range for tau. But it fails. Could anyone kindly help me on this issue? Thanks On Feb 5, 2008 10:45 PM, Duncan Murdoch [EMAIL PROTECTED] wrote: jim holtman wrote: Your function 'll' only returns a single value when passed a vector: x - seq(0,2,.1) ll(x) [1] -7.571559 'plot' expects to pass a vector to the function and have it return a vector of the same length; e.g., sin(x) [1] 0. 0.09983342 0.19866933 0.29552021 0.38941834 0.47942554 0.56464247 0.64421769 0.71735609 [10] 0.78332691 0.84147098 0.89120736 0.93203909 0.96355819 0.98544973 0.99749499 0.99957360 0.99166481 [19] 0.97384763 0.94630009 0.90929743 So you either have to rewrite your function, or have a loop that will evaluate the function at each individual point and then plot it. Or use Vectorize, e.g. plot(Vectorize(ll), 0, 2) Duncan Murdoch On Feb 5, 2008 7:06 PM, John Smith [EMAIL PROTECTED] wrote: Dear R-users, Suppose I have defined a likelihood function as ll(tau), how can I plot this likelihood function by calling it by plot? I want to do it like this: ll - function(tau) { w - 1 / (s^2 + tau^2) mu - sum(theta * w) / sum(w) -1/2*sum((theta-mu)^2*w -log(w)) } plot(ll, 0, 2) But have the following error: Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ In addition: Warning messages: 1: In s^2 + tau^2 : longer object length is not a multiple of shorter object length 2: In theta * w : longer object length is not a multiple of shorter object length 3: In (theta - mu)^2 * w : longer object length is not a multiple of shorter object length Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R2WinBUGS is broken
Dear R-users, I am trying to use the following code to reproduce results from Prof. Gelman's book, but have the listed error for R2WinBUGS version (the openbugs version is good). I am using R-2.6.1 on windows XP, and all the R packages are most current ones. schools.bug can be found at http://www.stat.columbia.edu/~gelman/bugsR/runningbugs.html . Can anyone help me to figure out what's going wrong? rm(list=ls()) library(R2WinBUGS) library ('BRugs') setwd('C:/bayesian/bda/codes/example_5_5') data(schools) J - nrow(schools) y - schools$estimate sigma.y - schools$sd data - list(J=J, y=y, sigma.y=sigma.y) inits - function() list(theta=rnorm(J,0,100), mu.theta=rnorm(1,0,100), sigma.theta=runif(1,0,100)) parameters - c('theta', 'mu.theta', 'sigma.theta') schools.sim - bugs(data, inits, parameters, model.file='schools.bug', n.chains=3, n.iter=1000, clearWD=TRUE) ## schools.sim - bugs(data, inits, parameters, model.file='schools.bug', ## n.chains=3, n.iter=1000, clearWD=TRUE, ## program = 'openbugs') print(schools.sim) plot(schools.sim) Error in matrix(unlist(mx), ncol = n.cols, byrow = TRUE) : invalid 'nrow' value (too large or NA) Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hmisc: can not reproduce figure 4 of Statistical Tables and Plots using S and LATEX
Dear R-users: I can not reproduce figure 4 of *Statistical Tables and Plots using S and LATEX* by Prof. Frank Harrell with the following code: rm(list=ls()) library(Hmisc) getHdata(pbc) attach(pbc) age.groups - cut2(age, c(45,60)) g - function(y) apply(y, 2, quantile, c(.25,.5,.75)) y - with(pbc, cbind(Chol=chol,Bili=bili)) # You can give new column names that are not legal S names # by enclosing them in quotes, e.g. 'Chol (mg/dl)'=chol vars - with(pbc, c(label(chol), label(bili))) label(y) - paste(vars, collapse=' and ') # Will make nice caption in table s3 - summary(y ~ age.groups + ascites, fun=g, data=pbc) s3 windows(width=10.67, height=6.60) par(mfrow=c(1,2), oma=c(3,0,3,0)) for(ivar in 1:2) { isub - (1:3)+(ivar-1)*3 print(isub) plot(s3, which=isub, main='', xlab=vars[ivar], pch=c(91,16,93)) } mtitle(paste('Quartiles of', label(y)), cex.m=1.5) Could any one help me? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.