Re: [R] Bug in stepAIC?

2006-10-12 Thread Martin C. Martin
Prof Brian Ripley wrote:
 You sent this earlier to R-devel.  Please do see the posting guide! 
 Since you (incorrectly) thought this was a bug in MASS, you should have 
 contacted the maintainer.

Thanks, but I did try emailing both you and Prof. Venables directly a 
month ago.  After not receiving a response, I emailed R-devel last week. 
  After not receiving a response there, I thought perhaps the code was 
correct after all, and I misunderstood how to call it - a perfect 
question for R-help.

There can be a fine line between R-help and R-devel, which is even 
harder to find when you're new to R and you don't really know where the 
problem is.

 On Wed, 11 Oct 2006, Martin C. Martin wrote:
 
 Hi,

 First of all, thanks for the great work on R in general, and MASS in
 particular.  It's been a life saver for me many times.

 However, I think I've discovered a bug.  It seems that, when I use
 weights during an initial least-squares regression fit, and later try to
 add terms using stepAIC(), it uses the weights when looking to remove
 terms, but not when looking to add them:

 hills.lm - lm(time ~ dist + climb, data = hills, weights = 1/dist2)
 
 Presumably dist^2?

Yes, sorry, a problem with Thunderbird being a little too smart for it's 
own good.  :)

 small.hills.lm - stepAIC(hills.lm)
 stepAIC(small.hills.lm, time ~ dist + climb)

 In the first stepAIC(), it says that the AIC for the full time ~ dist +
 climb is 94.41.  Yet, during the second stepAIC, it says adding climb
 would produce an AIC of 212.1 (and an RSS of 12633.3).  Is this a bug?
 
 Yes, but not in stepAIC.  Consider
 
 drop1(hills.lm)
 Single term deletions
 
 Model:
 time ~ dist + climb
Df Sum of SqRSSAIC
 none  437.64  94.41
 dist1164.05 601.68 103.55
 climb   1  8.66 446.29  93.10
 add1(small.hills.lm, time ~ dist + climb)
 Single term additions
 
 Model:
 time ~ dist
Df Sum of Sq RSS AIC
 none  15787.2   217.9
 climb   13153.8 12633.3   212.1
 stats:::add1.default(small.hills.lm, time ~ dist + climb)
 Single term additions
 
 Model:
 time ~ dist
DfAIC
 none93.097
 climb   1 94.411
 
 so the bug is in add1.lm, part of R itself.  Other code has been altered 
 which then broke add1.lm and 'z' needs to be given class lm.  Now 
 fixed in r-devel and r-patched.

Great; thanks!

Best,
Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bug in stepAIC?

2006-10-11 Thread Martin C. Martin
Hi,

First of all, thanks for the great work on R in general, and MASS in 
particular.  It's been a life saver for me many times.

However, I think I've discovered a bug.  It seems that, when I use 
weights during an initial least-squares regression fit, and later try to 
add terms using stepAIC(), it uses the weights when looking to remove 
terms, but not when looking to add them:

hills.lm - lm(time ~ dist + climb, data = hills, weights = 1/dist2)
small.hills.lm - stepAIC(hills.lm)
stepAIC(small.hills.lm, time ~ dist + climb)

In the first stepAIC(), it says that the AIC for the full time ~ dist + 
climb is 94.41.  Yet, during the second stepAIC, it says adding climb 
would produce an AIC of 212.1 (and an RSS of 12633.3).  Is this a bug?

Best,
Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.table() and scientific notation

2006-10-10 Thread Martin C. Martin
I think the colClasses argument to read.table() is what you need. 
Either that, or explicitly cast columns in the data.frame that's 
returned by read.table().  That's how you get data types that aren't 
directly supported by read.table(), like various date formats.

- Martin

January Weiner wrote:
 Oh, thanks, that was hint enough :-) I see it now. I turns that R does
 not understand
 
 e-10
 
 ...which stands for 1e-10 and is produced by some of the bioinformatic
 applications that I use (notably BLAST). However, R instead of being
 verbose on that just assumes that the whole column is a string.
 
 Is there a way to enforce a specific conversion in R (for example, to
 be able to see where the errors are?).
 
 January


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] update.default evaluating in wrong environment?

2006-10-10 Thread Martin C. Martin
Hi all,

update.default, which is the method used to update lm objects (among
others), extracts the call element from it's first argument, updates
it, then evaluates it in the parent.frame().  Shouldn't it be evaluated
in environment(formula(object)), if that's non-NULL?

I ask because I call lm from within a function, and the data argument
is a local variable of that function.  After that, I can't update the
model any more, since the new lm() call (the one evaled in
parent.frame()) can't find the data.

Best,
Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] update.default evaluating in wrong environment?

2006-10-10 Thread Martin C. Martin


Gabor Grothendieck wrote:
 As a workaround use evaluate=FALSE argument to update and
 evaluate it yourself fetching the environment from the innards
 of the lm structure:
 
 f - function() {
   DF - data.frame(y = 1:12, x1 = gl(2, 1, 12), x2 = gl(2,6))
   lm(y ~ x1, DF)
 }
 
 f.lm - f()
 e - attr(terms(f.lm), .Environment)
 eval(update(f.lm, formula = y ~ x2, evaluate = FALSE), e)

Thanks, or even just:

e - environment(formula(f.lm))

But this was more of a bug report.  Is update.default wrong?  Should it 
be changed?  I don't see how evaluating in update's parent environment 
would ever be better default behavior than the formula's environment.

- Martin
 
 
 On 10/10/06, Martin C. Martin [EMAIL PROTECTED] wrote:
 Hi all,

 update.default, which is the method used to update lm objects (among
 others), extracts the call element from it's first argument, updates
 it, then evaluates it in the parent.frame().  Shouldn't it be evaluated
 in environment(formula(object)), if that's non-NULL?

 I ask because I call lm from within a function, and the data argument
 is a local variable of that function.  After that, I can't update the
 model any more, since the new lm() call (the one evaled in
 parent.frame()) can't find the data.

 Best,
 Martin

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] update.default evaluating in wrong environment?

2006-10-10 Thread Martin C. Martin
Great; thanks for the detailed explanation!

- Martin

Thomas Lumley wrote:
 On Tue, 10 Oct 2006, Martin C. Martin wrote:
 Thanks, or even just:

 e - environment(formula(f.lm))

 But this was more of a bug report.  Is update.default wrong?  Should it
 be changed?  I don't see how evaluating in update's parent environment
 would ever be better default behavior than the formula's environment.

 
 This is deliberate (although that doesn't necessarily imply that it is 
 optimal). There already is a bug report (PR#1861) on this issue, so 
 another one certainly isn't appropriate.
 
 The comments for this bug report give an example taken from the scripts 
 for MASS chapter 6, doing a bootstrap of a linear model
 ph.fun - function(data, i) {
   d - data
   d$calls - d$fitted + d$res[i]
   coef(update(fit, data=d))
 }
 
 This is not uncommon for resampling and other perturbations of a model 
 and requires looking in the calling frame of update().
 
 For other uses such as yours the right place would be the frame where 
 the model was created. Neither the environment of the formula nor the 
 calling frame of update() is guaranteed to be the right place and it is 
 easy to come up with examples where each works and the other doesn't.
 
 There really isn't a trivial solution to this.
 
 -thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Stopping ctrl-\ from qutting R

2006-10-06 Thread Martin C. Martin
Hi,

In the Linux (FC3) version of R, ctrl-\ quits R.  This wouldn't be so 
bad, but on my keyboard, it's right next to ctrl-p and I tend to hit it 
by accident.

Is there any way to turn that off?

Best,
Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stopping ctrl-\ from qutting R

2006-10-06 Thread Martin C. Martin
previous command.  Very helpful when you want to repeat something you 
recently did, perhaps after modifying it a bit.

- Martin

Alberto Monteiro wrote:
 Martin C. Martin wrote:
 In the Linux (FC3) version of R, ctrl-\ quits R.  This wouldn't be 
 so bad, but on my keyboard, it's right next to ctrl-p and I tend to 
 hit it by accident.

 And what is the use of ctrl-p?
 
 Alberto Monteiro
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stopping ctrl-\ from qutting R

2006-10-06 Thread Martin C. Martin
Thanks!  I put it in my .bashrc, now it will never bother me again.

Marc Schwartz (via MN) wrote:
 
 BTW, you might want to consider updating your FC distro, as FC3 is now
 EOL 

Unfortunately, this is my work machine and they're still using FC3. 
Although I've heard various grumblings about it, so maybe that will 
change some day...

Best,
Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Computing sums of the columns of an array

2005-08-05 Thread Martin C. Martin
Hi,

I have a 5x731 array A, and I want to compute the sums of the columns.  
Currently I do:

apply(A, 2, sum)

But it turns out, this is slow: 70% of my CPU time is spent here, even 
though there are many complicated steps in my computation.

Is there a faster way?

Thanks,
Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Computing sums of the columns of an array

2005-08-05 Thread Martin C. Martin
Thanks everyone.

Gavin Simpson wrote:

But neither is that slow on my system. What is A?

It's in the middle of a loop.  I'm doing some maximum likelihood 
estimation, and having to evaluate my model over and over again for 
different parameter values.

Thanks,
Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Statistical significance of a classifier

2005-08-05 Thread Martin C. Martin
Liaw, Andy wrote:

From: Martin C. Martin

Hi,

I have a bunch of data points x from two classes A  B, and 
I'm creating 
a classifier.  So I have a function f(x) which estimates the 
probability 
that x is in class A.  (I have an equal number of examples of 
each, so 
p(class) = 0.5.)

One way of seeing how well this does is to compute the error 
rate on the 
test set, i.e. if f(x)0.5 call it A, and see how many times I 
misclassify an item.  That's what MASS does.  But we should 



Surely you mean `99% of dataminers/machine learners' rather than `MASS'?
  


That was my impression, but I didn't want to presume to speak for most 
dataminers/machine learners.


be able to 
do better: misclassifying should be more of a problem if the 
regression 
is confident then if it isn't.

How can I show that my f(x) = P(x is in class A) does better 
than chance?



It depends on what you mean by `better'.  For some problem, people are
perfectly happy with misclassifcation rate.  For others, the estimated
probabilities count a lot more.  One possibility is to look at the ROC
curve.  Another possibility is to look at the calibration curve (see MASS
the book).
  


Thanks, those are getting closer to what I want.  I think the bottom 
line is that I can't really assign a p-value the way I want to, since 
the problem I'm thinking of is ill-posed.

Thanks,
Martin


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Adding sum to derivatives table

2005-08-04 Thread Martin C. Martin
Hi,

Trying this:

deriv(expression(sum(x)), x)

Gives the error message:

Function 'sum' is not in the derivatives table

I'd like to add it, is this difficult?  If not, where is the derivatives 
table?

However, give how basic sum is, I suspect it would have been added if 
it were straightforward.  Do functions in the derivatives table need to 
be vectorized?  i.e. given a vector of length n, they need to produce a 
vector of length n?  Or is there some other restriction?

Thanks,
Martin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Adding sum to derivatives table

2005-08-04 Thread Martin C. Martin


Rolf Turner wrote:

 deriv(expression(sum(x)), x)



   does not make any sense.
  

Good point.  But this does:

deriv(expression(sum(log(a*x))), a)

where a is a scalar.

- Martin

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help: how to stop the process when there is a mishandling

2005-08-04 Thread Martin C. Martin
The R for Windows FAQ has many useful tips, including this one:

http://cran.r-project.org/bin/windows/base/rw-FAQ.html

It's well worth at least reading the contents, so you know what 
questions it can answer.

- Martin

Duncan Murdoch wrote:

On 8/4/2005 11:39 AM, Shengzhe Wu wrote:
  

and how about R on windows system? since I am currently using R of
windows version.



Hit ESC in the console.  If that doesn't work, there's a bug.  Report 
it to the package maintainer if you're running package code, or here if 
it's base code.  Be sure to give enough details that others can see 
things freeze.

And if it really is frozen, you can use Ctrl-Alt-Del to open the Windows 
  Task Manager, select the Rgui application, and End task.  Assuming 
you have permission, etc.

Duncan Murdoch
  

Thank you,
Shengzhe

On 8/4/05, Roger D. Peng [EMAIL PROTECTED] wrote:


On Unix you can send one of the 'USR' signals to 'kill'.  See
?Signals.

-roger

Shengzhe Wu wrote:
  

Hello,

How to stop the process when there is a mishandling making R system
frozen? If there is a way other than quitting the system when frozen
occurs?

Thank you,
Shengzhe

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html



--
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng/

  

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
  


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Converting a number of minutes to a difftime

2005-05-12 Thread Martin C. Martin
Hi,
I have a variable m that contains the number of minutes that something 
lasted, e.g.

m - 139
I'd like to convert it to a difftime, so I can add it to the POSIXct 
start time.  But as.difftime seems to want a character string, with at 
most two characters for the minutes.  All other conversion functions 
seem to take strings as well.  What am I missing?

Thanks,
Martin
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Ploting a function of two arguments

2005-03-10 Thread Martin C. Martin
Hi,
I've written a function:
myfun - function(x, y) {
  // blah blah
}
and I want to graph it.  My plan is to use persp(), and my question is: 
how do I create the array of values?  One possibility is:

x - seq(0, 10, by=.1)
y - seq(0, 10, by=.1)
inputs - somehow create an array of x,y pairs
outputs - apply(A, c(1,2), myfun)
The inputs array would need to be 101x101x2 array, i.e. basically a 2D 
array of pairs.  I might need to write a little wrapper for myfun.

Am I on the right track, or is there an easier way to do this?  It seems 
there should be an easier way to graph a 2 argument function like this.

- Martin
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] trouble loading R function code

2005-03-10 Thread Martin C. Martin
try:
source(fhv.R)
David Reinke wrote:
I created an R function offline with a text editor and saved it as fhv.R
When I type in 
load(file=fhv.R)
I get the message
Error: bad restore file magic number (file may be corrupted)-- no data
loaded
I've checked the file, and it opens up fine in Notepad and other text
editors. I also tried this from another folder and got the same message.
What's going on?
Thanks in advance.

David Reinke
Senior Transportation Engineer/Economist
Dowling Associates, Inc.
180 Grand Avenue, Suite 250
Oakland, California 94612-3774
510.839.1742 x104 (voice)
510.839.0871 (fax)
www.dowlinginc.com
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] sample function

2005-03-10 Thread Martin C. Martin
hist is lumping things together.
Try:
sum(temp == 0)
compare to the height of the left most bar.
Is this a bug in hist?
- Martin
mirage sell wrote:
Hi everyone, I need help.
I want to have a uniform kind distribution. When I used sample 
function I got almost twice many zeros compared to other numbers. What's 
wrong with my command ?

temp -sample(0:12, 2000, replace=T,prob=(rep(1/13,13)))
hist(temp)
Thanks in advance,
Taka,
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html