Hi All,

This is my understanding of the Random Forest Algorithm : 
Random Forest algorithm creates number of trees using randomly selected subset 
of samples and features. At each node of the tree it uses the Gini information 
gain
to find the best feature-threshold (various threshold is tested for each 
feature) pair to obtain the best separation between the positive and the 
negative class.

Question 1 : 
        I have a two class classification problem where the positive labels 
reside in clusters. A traditional cross validation approach is not aware of 
this issue and splits data 
        points from a cluster in to training and test set giving rise to strong 
classification performance. I wrote a custom cross validation loop to address 
this issue. However
        the bootstrapping method inside the Random Forest algorithm randomly 
selects samples and features and controls for overfitting. 

        When it applies the fit method on randomly selected samples, does it do 
an internal cross validation to prevent overfitting ? I did not find this in 
the github code. 
        If yes, Can I specify my groupings to Random Forest ? 

Question 2 : 
        Gini impurity at each node tries to find the best separation between 
two classes. I care more about obtaining a cleaner separation for my positive 
class. Is there 
        any way to give importance to one class during the partitioning. 

Thanks in advance.

Mamun 



------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to