Re: [scikit-learn] Need for multioutput multivariate algorithm for Random Forest in Python (using Mahalanobis distance)

Paul Chike Ofoche via scikit-learn Fri, 14 Feb 2020 04:40:27 -0800

 Scikit-learn random forest does not handle the multi-output case, but only 
maps to each output one at a time, thereby not accounting for the correlation 
between multi-outputs, which is what the Mahalanobis distance does. I, as well 
as other researchers have observed this issue for as much as two years. Could 
there be a solution to implement it in RandomForest, since Python already has a 
function that computes Mahalanobis distances?

On Thursday, February 13, 2020, 10:15:11 PM CST, Andreas Mueller
<[email protected]> wrote:

On 2/9/20 12:21 PM, Paul Chike Ofoche via scikit-learn wrote:

Hello all,

My name is Paul and I am enthused about data science. I have been using Python
and other programming languages for close to two years. There is an issue that
I have been facing since I began applying Python to the analysis of my research
work.

My question has remained unanswered for months. Has anybody not run into the
need to work with data whereby the regression results are a multiple output, in
which the output parameters are correlated with each other? This is called a
multi-output multivariate problem. A version of random forest that handles
multiple outputs is referred to as the multivariate random forest. It is
implemented in the programming language, R (see attached reference
documentation below).
The scikit-learn random forest actually handles this. It doesn't use the
mahalanobis distance but that seems like a simple preprocessing step.

Till date, there exists no such package in Python. My question is whether
anybody knows how to go about implementing this. The random forest univariate
regression case utilizes the Euclidean distance as the measurement criteria,
whereas the multivariate regression case uses the Mahalanobis distance, which
takes into account the inter-relationships between the multiple outputs. I have
inquired about an equivalent capability in Python for many years, but it has
still not been addressed. Such a multivariate random forest mode is very
applicable to the type of research and analysis that I do. Could someone help,
please?

Thank you,

Paul Ofoche

PS: This is an important need for multivariate output analysis as a technique
to solving practical research problems. Here are some posted questions by
various other Python users concerning this same issue.

https://datascience.stackexchange.com/questions/21637/code-for-multivariate-random-forest-in-python-r

Multi-output regression

|
|
|
|
|
|

|
|
|
|
Multi-output regression

I have been looking in to Multi-output regression the last view weeks. I am
working with the scikit learn packag...
|

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Need for multioutput multivariate algorithm for Random Forest in Python (using Mahalanobis distance)

Reply via email to