[jira] [Commented] (IGNITE-8059) Integrate decision tree with partition based dataset
[ https://issues.apache.org/jira/browse/IGNITE-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432000#comment-16432000 ] ASF GitHub Bot commented on IGNITE-8059: Github user asfgit closed the pull request at: https://github.com/apache/ignite/pull/3760 > Integrate decision tree with partition based dataset > > > Key: IGNITE-8059 > URL: https://issues.apache.org/jira/browse/IGNITE-8059 > Project: Ignite > Issue Type: Improvement > Components: ml >Reporter: Anton Dmitriev >Assignee: Anton Dmitriev >Priority: Major > Fix For: 2.5 > > > A partition based dataset (new underlying infrastructure component) was added > as part of IGNITE-7437 and now we need to adopt decision tree algorithm to > work on top of this infrastructure. > > The way decision tree algorithm is implemented on top of a row-partitioned > data is described further. > At first, the basic idea behind any decision tree, bother regression and > classification, is to find the *data split* that allows to minimize an > *impurity measure* like [Gini > coefficient|https://en.wikipedia.org/wiki/Gini_coefficient], > [entropy|https://en.wikipedia.org/wiki/Entropy_(information_theory)] or [mean > squared error|https://en.wikipedia.org/wiki/Mean_squared_error]. To calculate > the best split we need to build a _function_ that describes dependency > between split point (independent variable) and impurity measure (dependent > variable) and then find a minimum of this _function_. > In case of a distributed system, when a data is partitioned by row, we can > calculate such _function_ on every node, compress it somehow, and then pass > it to the master node. On the master node we need to summarize _functions_ > received from all nodes and then find a minimum of the result _function_. > It's the way decision tree algorithm is implemented in Apache Ignite ML > module. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8059) Integrate decision tree with partition based dataset
[ https://issues.apache.org/jira/browse/IGNITE-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427163#comment-16427163 ] ASF GitHub Bot commented on IGNITE-8059: GitHub user dmitrievanthony opened a pull request: https://github.com/apache/ignite/pull/3760 IGNITE-8059 Integrate decision tree with partition based dataset You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-8059 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/3760.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3760 commit 820694f2e8a47847e43167cbbc30fbc7d9b47c7b Author: Anton Dmitriev Date: 2018-04-05T08:16:28Z IGNITE-8059 Initial version of decision trees implemented on top of partition based dataset. commit 3860e5a171c8cae1f38c73fe30c8d3d2e2a46246 Author: Anton Dmitriev Date: 2018-04-05T10:07:55Z IGNITE-8059 Add tests for impurity (decision trees). commit b8dbb817d288f3f57ed6373e76dacab5c48a68c1 Author: Anton Dmitriev Date: 2018-04-05T12:57:25Z IGNITE-8059 Add tests for decision trees regression and classification trainers, add special decision tree partition data class. commit c3b54ed2663c8a8ab2697146f0437d9f104f25cf Author: Anton Dmitriev Date: 2018-04-05T13:19:45Z IGNITE-8059 Add step function compressor (initial version). commit 08798caa274a60d065a7b716253b8c6fa54ee5b1 Author: Anton Dmitriev Date: 2018-04-05T14:04:13Z IGNITE-8059 Add MNIST test for decision tree algorithm. commit 01e9b1c6a7ed74ae5c71ecedc13aee7df341980a Author: Anton Dmitriev Date: 2018-04-05T16:02:03Z IGNITE-8059 Add MNIST tests and remove UI part of decision tree. > Integrate decision tree with partition based dataset > > > Key: IGNITE-8059 > URL: https://issues.apache.org/jira/browse/IGNITE-8059 > Project: Ignite > Issue Type: Improvement > Components: ml >Reporter: Anton Dmitriev >Assignee: Anton Dmitriev >Priority: Major > Fix For: 2.5 > > > A partition based dataset (new underlying infrastructure component) was added > as part of IGNITE-7437 and now we need to adopt decision tree algorithm to > work on top of this infrastructure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)