Scikit-learn random forest does not handle the multi-output case, but only 
maps to each output one at a time, thereby not accounting for the correlation 
between multi-outputs, which is what the Mahalanobis distance does. I, as well 
as other researchers have observed this issue for as much as two years. Could 
there be a solution to implement it in RandomForest, since Python already has a 
function that computes Mahalanobis distances?

    On Thursday, February 13, 2020, 10:15:11 PM CST, Andreas Mueller 
<t3k...@gmail.com> wrote:  
 
  
 
 On 2/9/20 12:21 PM, Paul Chike Ofoche via scikit-learn wrote:
  
 
   
Hello all,
   
 My name is Paul and I am enthused about data science. I have been using Python 
and other programming languages for close to two years. There is an issue that 
I have been facing since I began applying Python to the analysis of my research 
work.
   
 
 
   
 My question has remained unanswered for months. Has anybody not run into the 
need to work with data whereby the regression results are a multiple output, in 
which the output parameters are correlated with each other? This is called a 
multi-output multivariate problem. A version of random forest that handles 
multiple outputs is referred to as the multivariate random forest. It is 
implemented in the programming language, R (see attached reference 
documentation below).
    The scikit-learn random forest actually handles this. It doesn't use the 
mahalanobis distance but that seems like a simple preprocessing step.
 
     
 
 Till date, there exists no such package in Python. My question is whether 
anybody knows how to go about implementing this. The random forest univariate 
regression case utilizes the Euclidean distance as the measurement criteria, 
whereas the multivariate regression case uses the Mahalanobis distance, which 
takes into account the inter-relationships between the multiple outputs. I have 
inquired about an equivalent capability in Python for many years, but it has 
still not been addressed. Such a multivariate random forest mode is very 
applicable to the type of research and analysis that I do. Could someone help, 
please? 
   
 Thank you,
   
 Paul Ofoche
   
  
   
 PS: This is an important need for multivariate output analysis as a technique 
to solving practical research problems. Here are some posted questions by 
various other Python users concerning this same issue.
   
  
   
 
https://datascience.stackexchange.com/questions/21637/code-for-multivariate-random-forest-in-python-r
   
  
   
 Multi-output regression
 
   
|  
|  
|  
| 
  | 
  |

  |

  |
|  
| 
  |  
Multi-output regression
 
I have been looking in to Multi-output regression the last view weeks. I am 
working with the scikit learn packag...
  |

  |

  |

  
  
   
   
  _______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
 
 
 _______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
  
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to