Hi Paul,
The way multioutput is handled in decision trees (and thus in the
forests) is described in
https://scikit-learn.org/stable/modules/tree.html#multi-output-problems.
As you can see, the correlation between the output values *is* taken
into account.
Can you explain what you would like to modify there?
Nicolas
On 2/14/20 7:37 AM, Paul Chike Ofoche via scikit-learn wrote:
Scikit-learn random forest does *not *handle the multi-output case,
but only maps to each output one at a time, thereby not accounting for
the correlation between multi-outputs, which is what the Mahalanobis
distance does. I, as well as other researchers have observed this
issue for as much as two years. Could there be a solution to implement
it in RandomForest, since Python already has a function that computes
Mahalanobis distances?
On Thursday, February 13, 2020, 10:15:11 PM CST, Andreas Mueller
<t3k...@gmail.com> wrote:
On 2/9/20 12:21 PM, Paul Chike Ofoche via scikit-learn wrote:
Hello all,
My name is Paul and I am enthused about data science. I have been
using Python and other programming languages for close to two years.
There is an issue that I have been facing since I began applying
Python to the analysis of my research work.
My question has remained unanswered for months. Has anybody not run
into the need to work with data whereby the regression results are a
multiple output, in which the output parameters are correlated with
each other? This is called a multi-output multivariate problem. A
version of random forest that handles multiple outputs is referred to
as the multivariate random forest. It is implemented in the
programming language, R (see attached reference documentation below).
The scikit-learn random forest actually handles this. It doesn't use
the mahalanobis distance but that seems like a simple preprocessing step.
Till date, there exists no such package in Python. My question is
whether anybody knows how to go about implementing this. The random
forest univariate regression case utilizes the Euclidean distance as
the measurement criteria, whereas the multivariate regression case
uses the Mahalanobis distance, which takes into account the
inter-relationships between the multiple outputs. I have inquired
about an equivalent capability in Python for many years, but it has
still not been addressed. Such a multivariate random forest mode is
very applicable to the type of research and analysis that I do. Could
someone help, please?
Thank you,
Paul Ofoche
PS: This is an important need for multivariate output analysis as a
technique to solving practical research problems. Here are some
posted questions by various other Python users concerning this same
issue.
*https://datascience.stackexchange.com/questions/21637/code-for-multivariate-random-forest-in-python-r*
Multi-output regression
<https://stackoverflow.com/questions/49391637/multi-output-regression>
Multi-output regression
I have been looking in to Multi-output regression the last view
weeks. I am working with the scikit learn packag...
<https://stackoverflow.com/questions/49391637/multi-output-regression>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn