[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-11-17 Thread German Eduardo Melo (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15675699#comment-15675699
 ] 

German Eduardo Melo commented on SPARK-9478:


[~sethah] I was wondering if you are working on this request...the current PR 
for the improvement is https://github.com/apache/spark/pull/13851, right? Due 
my research I am looking forward this feature, thanks a lot in advance for any 
update!

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-11-01 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15626725#comment-15626725
 ] 

Joseph K. Bradley commented on SPARK-9478:
--

Removing target version since 2.1 RC1 is being cut soon

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-10-10 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563919#comment-15563919
 ] 

Seth Hendrickson commented on SPARK-9478:
-

I'm going to revive this, and hopefully submit a PR soon.

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-06-23 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15347138#comment-15347138
 ] 

Seth Hendrickson commented on SPARK-9478:
-

[~mengxr] Thanks for your feedback. Originally I did not implement a change to 
the sampling semantics, though after some thought it does not seem entirely 
correct to only apply the sampling weights after bagging. I checked 
scikit-learn and they do not use weighted sampling (instead applying weights 
after taking uniform samples), but I think we should implement the weighted 
sampling assuming it can fit into the current Spark abstractions.

>From my understanding, it is reasonable to use the Poisson distribution as an 
>approximation to the Multinomial sampling. Currently, we approximate binomial 
>sampling using a Poisson sampler with constant mean. To implement weighted 
>sampling with replacement, we can use a Poisson sampler with mean parameter 
>proportional to the sample weight - is that correct? We could use the 
>{{RandomDataGenerator}} class in StratifiedSamplingUtils, which maintains a 
>cache of Poisson sampling functions. I am not an expert in sampling algorithms 
>so I really appreciate your thoughts on this. 

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-06-23 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345946#comment-15345946
 ] 

Xiangrui Meng commented on SPARK-9478:
--

Sorry for being late in the discussion! Instance weight support is a superset 
of class weight support since we can easily map instance labels to 
corresponding weights as the instance weights. Is it correct? There could be 
storage overhead since it is hard to tell inside the algorithm whether the 
column is mapped or has raw data. We can add an extra counter to the aggregator 
to keep the semantic of "minInstancesPerNode". Neither of the issues would 
prevent us implementing instance weights. So I would +1 on implementing 
instance weight, which also covers class weight. It also helps the use cases 
where the labeled instances are associated with confidence scores, e.g., labels 
from implicit observations.

I didn't find any discussion about the semantic change in sampling strategy. No 
matter we implement class weights or instance weights, the simple random 
sampling should become weighted sampling to reflect weights, and the weights 
should be updated based on the sample ratio. Otherwise, even we put large 
weights on few instances with rare labels, they are likely to be dropped during 
sampling.

Just my two cents, I think [~josephkb] could help make a decision.

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-06-22 Thread Yuewei Na (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345447#comment-15345447
 ] 

Yuewei Na commented on SPARK-9478:
--

Hi [~sethah]. Actually, the code I PR has been used in our company for a period 
of time and we recently decide to make it open sourced. We used this 
implementation due to the fact that there is no class weights support in the 
current version and we do have practical needs. Comparing to sample weights, 
our version saves more memory since ours don't need to add a column to store 
sample weights.

At the same time, I browsed the APIs and implementations of the ensemble 
methods in scikit-learn. It's true that the class weights are integrated 
together with sample weights there. Together with the need of sample weights in 
other various models, I agree that a functionality that supports sample weights 
is a better choice. So now I have some thoughts on this problem:
  1. I agree with you on implementing a mechanism to support class weights. I 
think it will reduce users' effort to achieve their goal.
  2. Since my PR is a lightweight version and it has been tested and used in 
our company for a period of time, we could review and merge my PR to the master 
branch first to make it available to users who need it. And we can remove it 
when there are no problems that block the instance weight version while 
preserving the same interface for setting the class weights. We could either 
create a new JIRA which separates the problem 'adding class weights' and the 
problem 'adding instance weights'. But at least, the title of the current JIRA 
should be changed or a new JIRA should be created.

 

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-06-22 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345208#comment-15345208
 ] 

Seth Hendrickson commented on SPARK-9478:
-

Thanks for your timely feedback! There are many use cases for sample weights in 
machine learning algorithms that are broadly applicable. In regression, it is 
common to use sample weights to account for changing variance in the data 
generation process. Sample weights can also be used in both classification and 
regression to weight more recent data points that may be more reflective of the 
data generation model. Handling imbalanced datasets with class weights can be 
seen as a specific case of sample weights. Using upsampling/downsampling can 
cause unnecessary duplication of the input data and also makes it more 
difficult to assign arbitrary weights to samples. Even further, implementing 
weighted boosting algorithms like AdaBoost/LogitBoost etc... will not be 
possible without sample weights.

Scikit-learn does indeed support sample weights, as you can see 
[here|http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.fit],
 and in fact the algorithms simply convert class weights into sample weights. 

With this in mind, I think we should support sample weights. We might also want 
to implement a mechanism to support class weights in the API where users don't 
have to manually convert class weights to sample weights - we can open a new 
JIRA to discuss it. [There is an ongoing effort in MLlib to support instance 
weighting|https://issues.apache.org/jira/browse/SPARK-9610] in the various 
algorithms and so I think it is beneficial to add it to trees and forests.

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-06-22 Thread Yuewei Na (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345105#comment-15345105
 ] 

Yuewei Na commented on SPARK-9478:
--

Hi [~sethah], thanks a lot for your comment on my PR and your continual 
concerns on this problem. Sorry for not commenting before I made this PR. Like 
what you said, the major reason for me to make another PR is exactly because of 
the title of this JIRA. 

I implement this class weight version instead of sticking to instance weight 
because:
1. Existing implementations in other languages or packages, e.g. rpart in R and 
sklearn in Python all support class weights instead of instance weights. 
Indeed, instance weights make weighting in regression also possible. But the 
major application in handling imbalanced dataset is classification. If one does 
need such feature, it could be done by downsampling or upsampling the whole 
dataset. For the materials that I've read, including the book 'Elements of 
Statistical Learning', Rpart's 
documentation(https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf)
 and some professor's PPT. I've never seen the use cases for handling 
imbalanced dataset in regression problems using Random Forest. I would be very 
happy if someone could tell me when it's needed.

2. As you commented in the first PR, the instance weight implementation makes 
'minInstancesPerNode' feature in trouble. The class weight implementation has 
no such issue, which will make the code more stable because very few inner 
modifications are needed.

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-06-22 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344950#comment-15344950
 ] 

Seth Hendrickson commented on SPARK-9478:
-

There has been a bit of confusion regarding this JIRA, I think. [~pcrenshaw] 
Please do correct me if I'm wrong, but the JIRA is, in truth, for adding a 
mechanism to handle imbalanced classification datasets. This could be done 
through class weighting or through instance weighting, I suppose the 
implementation is up for debate. 

There has been potentially more confusion since an initial PR was made using 
instance weighting. Now there is a PR made which adds class weighting. I think 
adding instance weighting is the best approach here because it allows users to 
handle imbalanced outcome classes in their data, but also adds the ability to 
use instance weighting generically which has a broad range of use cases. 
Additionally, it is not specific to classification. Also, this is how the other 
ML algorithms have so far dealt with it and it will allow forests and trees to 
conform to the same API as Logistic/Linear regression, for example. 

I vote to change this JIRA title to "Add instance weights for Random Forest and 
Decision Trees" and proceed accordingly, but I'm open to other opinions. If we 
want to pursue class weights we can do it in a separate JIRA. And again, I have 
a PR ready for this which I have not submitted because of a.) other blocking 
issues and b.) Spark 2.0 QA takes review precedence for the time being. 

I look forward to others' thoughts. Ping [~josephkb] (I cannot ping n-triple-a 
because I don't know the JIRA username).

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-06-22 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344892#comment-15344892
 ] 

Apache Spark commented on SPARK-9478:
-

User 'n-triple-a' has created a pull request for this issue:
https://github.com/apache/spark/pull/13851

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-04-22 Thread Seth Hendrickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254539#comment-15254539
 ] 

Seth Hendrickson commented on SPARK-9478:
-

[~josephkb] I have a PR ready for this. It's being blocked by 
[SPARK-14610|https://issues.apache.org/jira/browse/SPARK-14610] and 
[SPARK-14599|https://issues.apache.org/jira/browse/SPARK-14599]. Is there any 
hope to get it in 2.0?

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2016-02-03 Thread Fabian Boehnlein (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130323#comment-15130323
 ] 

Fabian Boehnlein commented on SPARK-9478:
-

Thanks, [~meihuawu] woudl be very useful! Any ideas who could help review this?

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2015-10-06 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946201#comment-14946201
 ] 

Apache Spark commented on SPARK-9478:
-

User 'rotationsymmetry' has created a pull request for this issue:
https://github.com/apache/spark/pull/9008

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2015-10-06 Thread Patrick Crenshaw (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945282#comment-14945282
 ] 

Patrick Crenshaw commented on SPARK-9478:
-

[~meihuawu] No, I am not working on this. 

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2015-10-04 Thread Meihua Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942754#comment-14942754
 ] 

Meihua Wu commented on SPARK-9478:
--

[~pcrenshaw] Are you working on this? If not, I can send a PR based on 
[~josephkb]'s suggestions. 

> Add class weights to Random Forest
> --
>
> Key: SPARK-9478
> URL: https://issues.apache.org/jira/browse/SPARK-9478
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 1.4.1
>Reporter: Patrick Crenshaw
>
> Currently, this implementation of random forest does not support class 
> weights. Class weights are important when there is imbalanced training data 
> or the evaluation metric of a classifier is imbalanced (e.g. true positive 
> rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2015-08-04 Thread Patrick Crenshaw (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653799#comment-14653799
 ] 

Patrick Crenshaw commented on SPARK-9478:
-

If I work on this, should I wait until 
https://issues.apache.org/jira/browse/SPARK-3717 is finished?

 Add class weights to Random Forest
 --

 Key: SPARK-9478
 URL: https://issues.apache.org/jira/browse/SPARK-9478
 Project: Spark
  Issue Type: Improvement
  Components: ML, MLlib
Affects Versions: 1.4.1
Reporter: Patrick Crenshaw

 Currently, this implementation of random forest does not support class 
 weights. Class weights are important when there is imbalanced training data 
 or the evaluation metric of a classifier is imbalanced (e.g. true positive 
 rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2015-07-30 Thread Patrick Crenshaw (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647772#comment-14647772
 ] 

Patrick Crenshaw commented on SPARK-9478:
-

Similar to this ticket for Logistic Regression 
https://issues.apache.org/jira/browse/SPARK-7685 and this one for SVMWithSGD 
https://issues.apache.org/jira/browse/SPARK-3246

 Add class weights to Random Forest
 --

 Key: SPARK-9478
 URL: https://issues.apache.org/jira/browse/SPARK-9478
 Project: Spark
  Issue Type: Improvement
  Components: ML, MLlib
Affects Versions: 1.4.1
Reporter: Patrick Crenshaw

 Currently, this implementation of random forest does not support class 
 weights. Class weights are important when there is imbalanced training data 
 or the evaluation metric of a classifier is imbalanced (e.g. true positive 
 rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9478) Add class weights to Random Forest

2015-07-30 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648184#comment-14648184
 ] 

Joseph K. Bradley commented on SPARK-9478:
--

This sounds valuable.  Handling it by reweighting examples (as is being done 
for logreg) seems like the simplest solution for now.  I'll keep an eye on the 
ticket!

 Add class weights to Random Forest
 --

 Key: SPARK-9478
 URL: https://issues.apache.org/jira/browse/SPARK-9478
 Project: Spark
  Issue Type: Improvement
  Components: ML, MLlib
Affects Versions: 1.4.1
Reporter: Patrick Crenshaw

 Currently, this implementation of random forest does not support class 
 weights. Class weights are important when there is imbalanced training data 
 or the evaluation metric of a classifier is imbalanced (e.g. true positive 
 rate at some false positive threshold). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org