Re: [R] Bug in stepAIC?
Prof Brian Ripley wrote: You sent this earlier to R-devel. Please do see the posting guide! Since you (incorrectly) thought this was a bug in MASS, you should have contacted the maintainer. Thanks, but I did try emailing both you and Prof. Venables directly a month ago. After not receiving a response, I emailed R-devel last week. After not receiving a response there, I thought perhaps the code was correct after all, and I misunderstood how to call it - a perfect question for R-help. There can be a fine line between R-help and R-devel, which is even harder to find when you're new to R and you don't really know where the problem is. On Wed, 11 Oct 2006, Martin C. Martin wrote: Hi, First of all, thanks for the great work on R in general, and MASS in particular. It's been a life saver for me many times. However, I think I've discovered a bug. It seems that, when I use weights during an initial least-squares regression fit, and later try to add terms using stepAIC(), it uses the weights when looking to remove terms, but not when looking to add them: hills.lm - lm(time ~ dist + climb, data = hills, weights = 1/dist2) Presumably dist^2? Yes, sorry, a problem with Thunderbird being a little too smart for it's own good. :) small.hills.lm - stepAIC(hills.lm) stepAIC(small.hills.lm, time ~ dist + climb) In the first stepAIC(), it says that the AIC for the full time ~ dist + climb is 94.41. Yet, during the second stepAIC, it says adding climb would produce an AIC of 212.1 (and an RSS of 12633.3). Is this a bug? Yes, but not in stepAIC. Consider drop1(hills.lm) Single term deletions Model: time ~ dist + climb Df Sum of SqRSSAIC none 437.64 94.41 dist1164.05 601.68 103.55 climb 1 8.66 446.29 93.10 add1(small.hills.lm, time ~ dist + climb) Single term additions Model: time ~ dist Df Sum of Sq RSS AIC none 15787.2 217.9 climb 13153.8 12633.3 212.1 stats:::add1.default(small.hills.lm, time ~ dist + climb) Single term additions Model: time ~ dist DfAIC none93.097 climb 1 94.411 so the bug is in add1.lm, part of R itself. Other code has been altered which then broke add1.lm and 'z' needs to be given class lm. Now fixed in r-devel and r-patched. Great; thanks! Best, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bug in stepAIC?
Hi, First of all, thanks for the great work on R in general, and MASS in particular. It's been a life saver for me many times. However, I think I've discovered a bug. It seems that, when I use weights during an initial least-squares regression fit, and later try to add terms using stepAIC(), it uses the weights when looking to remove terms, but not when looking to add them: hills.lm - lm(time ~ dist + climb, data = hills, weights = 1/dist2) small.hills.lm - stepAIC(hills.lm) stepAIC(small.hills.lm, time ~ dist + climb) In the first stepAIC(), it says that the AIC for the full time ~ dist + climb is 94.41. Yet, during the second stepAIC, it says adding climb would produce an AIC of 212.1 (and an RSS of 12633.3). Is this a bug? Best, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table() and scientific notation
I think the colClasses argument to read.table() is what you need. Either that, or explicitly cast columns in the data.frame that's returned by read.table(). That's how you get data types that aren't directly supported by read.table(), like various date formats. - Martin January Weiner wrote: Oh, thanks, that was hint enough :-) I see it now. I turns that R does not understand e-10 ...which stands for 1e-10 and is produced by some of the bioinformatic applications that I use (notably BLAST). However, R instead of being verbose on that just assumes that the whole column is a string. Is there a way to enforce a specific conversion in R (for example, to be able to see where the errors are?). January __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] update.default evaluating in wrong environment?
Hi all, update.default, which is the method used to update lm objects (among others), extracts the call element from it's first argument, updates it, then evaluates it in the parent.frame(). Shouldn't it be evaluated in environment(formula(object)), if that's non-NULL? I ask because I call lm from within a function, and the data argument is a local variable of that function. After that, I can't update the model any more, since the new lm() call (the one evaled in parent.frame()) can't find the data. Best, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] update.default evaluating in wrong environment?
Gabor Grothendieck wrote: As a workaround use evaluate=FALSE argument to update and evaluate it yourself fetching the environment from the innards of the lm structure: f - function() { DF - data.frame(y = 1:12, x1 = gl(2, 1, 12), x2 = gl(2,6)) lm(y ~ x1, DF) } f.lm - f() e - attr(terms(f.lm), .Environment) eval(update(f.lm, formula = y ~ x2, evaluate = FALSE), e) Thanks, or even just: e - environment(formula(f.lm)) But this was more of a bug report. Is update.default wrong? Should it be changed? I don't see how evaluating in update's parent environment would ever be better default behavior than the formula's environment. - Martin On 10/10/06, Martin C. Martin [EMAIL PROTECTED] wrote: Hi all, update.default, which is the method used to update lm objects (among others), extracts the call element from it's first argument, updates it, then evaluates it in the parent.frame(). Shouldn't it be evaluated in environment(formula(object)), if that's non-NULL? I ask because I call lm from within a function, and the data argument is a local variable of that function. After that, I can't update the model any more, since the new lm() call (the one evaled in parent.frame()) can't find the data. Best, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] update.default evaluating in wrong environment?
Great; thanks for the detailed explanation! - Martin Thomas Lumley wrote: On Tue, 10 Oct 2006, Martin C. Martin wrote: Thanks, or even just: e - environment(formula(f.lm)) But this was more of a bug report. Is update.default wrong? Should it be changed? I don't see how evaluating in update's parent environment would ever be better default behavior than the formula's environment. This is deliberate (although that doesn't necessarily imply that it is optimal). There already is a bug report (PR#1861) on this issue, so another one certainly isn't appropriate. The comments for this bug report give an example taken from the scripts for MASS chapter 6, doing a bootstrap of a linear model ph.fun - function(data, i) { d - data d$calls - d$fitted + d$res[i] coef(update(fit, data=d)) } This is not uncommon for resampling and other perturbations of a model and requires looking in the calling frame of update(). For other uses such as yours the right place would be the frame where the model was created. Neither the environment of the formula nor the calling frame of update() is guaranteed to be the right place and it is easy to come up with examples where each works and the other doesn't. There really isn't a trivial solution to this. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Stopping ctrl-\ from qutting R
Hi, In the Linux (FC3) version of R, ctrl-\ quits R. This wouldn't be so bad, but on my keyboard, it's right next to ctrl-p and I tend to hit it by accident. Is there any way to turn that off? Best, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stopping ctrl-\ from qutting R
previous command. Very helpful when you want to repeat something you recently did, perhaps after modifying it a bit. - Martin Alberto Monteiro wrote: Martin C. Martin wrote: In the Linux (FC3) version of R, ctrl-\ quits R. This wouldn't be so bad, but on my keyboard, it's right next to ctrl-p and I tend to hit it by accident. And what is the use of ctrl-p? Alberto Monteiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stopping ctrl-\ from qutting R
Thanks! I put it in my .bashrc, now it will never bother me again. Marc Schwartz (via MN) wrote: BTW, you might want to consider updating your FC distro, as FC3 is now EOL Unfortunately, this is my work machine and they're still using FC3. Although I've heard various grumblings about it, so maybe that will change some day... Best, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Computing sums of the columns of an array
Hi, I have a 5x731 array A, and I want to compute the sums of the columns. Currently I do: apply(A, 2, sum) But it turns out, this is slow: 70% of my CPU time is spent here, even though there are many complicated steps in my computation. Is there a faster way? Thanks, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Computing sums of the columns of an array
Thanks everyone. Gavin Simpson wrote: But neither is that slow on my system. What is A? It's in the middle of a loop. I'm doing some maximum likelihood estimation, and having to evaluate my model over and over again for different parameter values. Thanks, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Statistical significance of a classifier
Liaw, Andy wrote: From: Martin C. Martin Hi, I have a bunch of data points x from two classes A B, and I'm creating a classifier. So I have a function f(x) which estimates the probability that x is in class A. (I have an equal number of examples of each, so p(class) = 0.5.) One way of seeing how well this does is to compute the error rate on the test set, i.e. if f(x)0.5 call it A, and see how many times I misclassify an item. That's what MASS does. But we should Surely you mean `99% of dataminers/machine learners' rather than `MASS'? That was my impression, but I didn't want to presume to speak for most dataminers/machine learners. be able to do better: misclassifying should be more of a problem if the regression is confident then if it isn't. How can I show that my f(x) = P(x is in class A) does better than chance? It depends on what you mean by `better'. For some problem, people are perfectly happy with misclassifcation rate. For others, the estimated probabilities count a lot more. One possibility is to look at the ROC curve. Another possibility is to look at the calibration curve (see MASS the book). Thanks, those are getting closer to what I want. I think the bottom line is that I can't really assign a p-value the way I want to, since the problem I'm thinking of is ill-posed. Thanks, Martin [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Adding sum to derivatives table
Hi, Trying this: deriv(expression(sum(x)), x) Gives the error message: Function 'sum' is not in the derivatives table I'd like to add it, is this difficult? If not, where is the derivatives table? However, give how basic sum is, I suspect it would have been added if it were straightforward. Do functions in the derivatives table need to be vectorized? i.e. given a vector of length n, they need to produce a vector of length n? Or is there some other restriction? Thanks, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Adding sum to derivatives table
Rolf Turner wrote: deriv(expression(sum(x)), x) does not make any sense. Good point. But this does: deriv(expression(sum(log(a*x))), a) where a is a scalar. - Martin [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help: how to stop the process when there is a mishandling
The R for Windows FAQ has many useful tips, including this one: http://cran.r-project.org/bin/windows/base/rw-FAQ.html It's well worth at least reading the contents, so you know what questions it can answer. - Martin Duncan Murdoch wrote: On 8/4/2005 11:39 AM, Shengzhe Wu wrote: and how about R on windows system? since I am currently using R of windows version. Hit ESC in the console. If that doesn't work, there's a bug. Report it to the package maintainer if you're running package code, or here if it's base code. Be sure to give enough details that others can see things freeze. And if it really is frozen, you can use Ctrl-Alt-Del to open the Windows Task Manager, select the Rgui application, and End task. Assuming you have permission, etc. Duncan Murdoch Thank you, Shengzhe On 8/4/05, Roger D. Peng [EMAIL PROTECTED] wrote: On Unix you can send one of the 'USR' signals to 'kill'. See ?Signals. -roger Shengzhe Wu wrote: Hello, How to stop the process when there is a mishandling making R system frozen? If there is a way other than quitting the system when frozen occurs? Thank you, Shengzhe __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Converting a number of minutes to a difftime
Hi, I have a variable m that contains the number of minutes that something lasted, e.g. m - 139 I'd like to convert it to a difftime, so I can add it to the POSIXct start time. But as.difftime seems to want a character string, with at most two characters for the minutes. All other conversion functions seem to take strings as well. What am I missing? Thanks, Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Ploting a function of two arguments
Hi, I've written a function: myfun - function(x, y) { // blah blah } and I want to graph it. My plan is to use persp(), and my question is: how do I create the array of values? One possibility is: x - seq(0, 10, by=.1) y - seq(0, 10, by=.1) inputs - somehow create an array of x,y pairs outputs - apply(A, c(1,2), myfun) The inputs array would need to be 101x101x2 array, i.e. basically a 2D array of pairs. I might need to write a little wrapper for myfun. Am I on the right track, or is there an easier way to do this? It seems there should be an easier way to graph a 2 argument function like this. - Martin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] trouble loading R function code
try: source(fhv.R) David Reinke wrote: I created an R function offline with a text editor and saved it as fhv.R When I type in load(file=fhv.R) I get the message Error: bad restore file magic number (file may be corrupted)-- no data loaded I've checked the file, and it opens up fine in Notepad and other text editors. I also tried this from another folder and got the same message. What's going on? Thanks in advance. David Reinke Senior Transportation Engineer/Economist Dowling Associates, Inc. 180 Grand Avenue, Suite 250 Oakland, California 94612-3774 510.839.1742 x104 (voice) 510.839.0871 (fax) www.dowlinginc.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] sample function
hist is lumping things together. Try: sum(temp == 0) compare to the height of the left most bar. Is this a bug in hist? - Martin mirage sell wrote: Hi everyone, I need help. I want to have a uniform kind distribution. When I used sample function I got almost twice many zeros compared to other numbers. What's wrong with my command ? temp -sample(0:12, 2000, replace=T,prob=(rep(1/13,13))) hist(temp) Thanks in advance, Taka, __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html