Re: [scikit-learn] Need for multioutput multivariate algorithm for Random Forest in Python (using Mahalanobis distance)

Nicolas Hug Fri, 14 Feb 2020 05:01:50 -0800

Hi Paul,

The way multioutput is handled in decision trees (and thus in theforests) is described inhttps://scikit-learn.org/stable/modules/tree.html#multi-output-problems.As you can see, the correlation between the output values *is* takeninto account.


Can you explain what you would like to modify there?

Nicolas

On 2/14/20 7:37 AM, Paul Chike Ofoche via scikit-learn wrote:

Scikit-learn random forest does *not *handle the multi-output case,but only maps to each output one at a time, thereby not accounting forthe correlation between multi-outputs, which is what the Mahalanobisdistance does. I, as well as other researchers have observed thisissue for as much as two years. Could there be a solution to implementit in RandomForest, since Python already has a function that computesMahalanobis distances?
On Thursday, February 13, 2020, 10:15:11 PM CST, Andreas Mueller<t3k...@gmail.com> wrote:
On 2/9/20 12:21 PM, Paul Chike Ofoche via scikit-learn wrote:

Hello all,
My name is Paul and I am enthused about data science. I have beenusing Python and other programming languages for close to two years.There is an issue that I have been facing since I began applyingPython to the analysis of my research work.
My question has remained unanswered for months. Has anybody not runinto the need to work with data whereby the regression results are amultiple output, in which the output parameters are correlated witheach other? This is called a multi-output multivariate problem. Aversion of random forest that handles multiple outputs is referred toas the multivariate random forest. It is implemented in theprogramming language, R (see attached reference documentation below).
The scikit-learn random forest actually handles this. It doesn't usethe mahalanobis distance but that seems like a simple preprocessing step.
Till date, there exists no such package in Python. My question iswhether anybody knows how to go about implementing this. The randomforest univariate regression case utilizes the Euclidean distance asthe measurement criteria, whereas the multivariate regression caseuses the Mahalanobis distance, which takes into account theinter-relationships between the multiple outputs. I have inquiredabout an equivalent capability in Python for many years, but it hasstill not been addressed. Such a multivariate random forest mode isvery applicable to the type of research and analysis that I do. Couldsomeone help, please?
Thank you,

Paul Ofoche
PS: This is an important need for multivariate output analysis as atechnique to solving practical research problems. Here are someposted questions by various other Python users concerning this sameissue.
*https://datascience.stackexchange.com/questions/21637/code-for-multivariate-random-forest-in-python-r*
Multi-output regression<https://stackoverflow.com/questions/49391637/multi-output-regression>
        


        


    Multi-output regression
I have been looking in to Multi-output regression the last viewweeks. I am working with the scikit learn packag...
<https://stackoverflow.com/questions/49391637/multi-output-regression>




_______________________________________________
scikit-learn mailing list
scikit-learn@python.org  <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Need for multioutput multivariate algorithm for Random Forest in Python (using Mahalanobis distance)

Reply via email to