[R] Random Forests Variable Importance Question

2009-04-13 Thread Paul Fisch
I am trying to use the random forests package for classification in R.

The Variable Importance Measures listed are:

-mean raw importance score of variable x for class 0

-mean raw importance score of variable x for class 1

-MeanDecreaseAccuracy

-MeanDecreaseGini

Now I know what these mean as in I know their definitions. What I
want to know is how to use them.

What I am trying to figure out is what these values mean in only the
context of how accurate they are, what is a good value, what is a bad
value, what are the maximums and minimums, etc.

If a variable has a high MeanDecreaseAccuracy or MeanDecreaseGini does
that mean it is important or unimportant? Also any information on the
raw scores would be really helpful too. I want to know everything
there is to know about these numbers that is relevant to the
application of them.

I don't really want a technical explanation that uses words like
'error', 'summation', or 'permutated', but rather a simpler
explanation that didn't involve any discussion of how random forests
works(I have read all about that and didn't find it very helpful.)

Like if I wanted someone to explain to me how to use a radio, I
wouldn't expect the explanation to involve how a radio converts radio
waves into sound.

If anyone can help me out at all it would be really great.  I have
read many many lectures on random forests and other data mining
lectures but I have never found simple answers about how to read the
variable importance measures.

Thanks,
Paul Fisch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] need help with stat functions(like adaboost, random forests and glm)

2008-08-13 Thread Paul Fisch
Ok, so basically I have a dataframe named data_frame

data_frame contains:
startdate
startprice
endpricethreshold1
endpricethreshold2
endpricethreshold3



all of these endpricethresholds are true/false binary vectors.  They are
true or false depending on whether the endprice was above or below whatever
the endpricethreshold is.

now I want to try to use lets say the general linear model to have it try
and predict which endprice thresholds will be true or false dependent upon
startdate and startprice.  So I have a formula like:

glm(endpricethreshold1 ~ ., data=data_frame[,c(1,2,3)],
family=binomial(logit));

but, for the first term endpricethreshold1(since I really have tons of
endpricethresholds and would like to make this a loop) I don't want to refer
to it by its name but instead by its column indice like this:

glm(data_frame[[3]] ~ ., data=data_frame[,c(1,2,3)],
family=binomial(logit));

However, when I do this I am getting completely different results and I have
no idea why.

If anyone could help it would be greatly appreciated.



Thanks,
Paul Fisch

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Some help with dates.

2008-06-20 Thread Paul Fisch
Hey,

I'm new to R but familiar with other programming languages.

Basically, I want to store an array of dates.  For each of these dates I
want to store only the day of the week and the hour.  So for example:
Monday 12 would be Monday at 12 o'clock and Tuesday 20 would be Tuesday
at 8 p.m.

Alternatively it could be stored as 0-6 for Sunday to Saturday.  So Tuesday
at 11 a.m. would be something like 2 11.

These dates don't need to be stored exactly like I have written.  I just
would like to know how R would store a day of the week combined with an hour
in a variable.


Thanks,
Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.