[jira] [Commented] (IGNITE-8059) Integrate decision tree with partition based dataset

2018-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432000#comment-16432000
 ] 

ASF GitHub Bot commented on IGNITE-8059:


Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/3760


> Integrate decision tree with partition based dataset
> 
>
> Key: IGNITE-8059
> URL: https://issues.apache.org/jira/browse/IGNITE-8059
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Anton Dmitriev
>Assignee: Anton Dmitriev
>Priority: Major
> Fix For: 2.5
>
>
> A partition based dataset (new underlying infrastructure component) was added 
> as part of IGNITE-7437 and now we need to adopt decision tree algorithm to 
> work on top of this infrastructure.
> 
> The way decision tree algorithm is implemented on top of a row-partitioned 
> data is described further.
> At first, the basic idea behind any decision tree, bother regression and 
> classification, is to find the *data split* that allows to minimize an 
> *impurity measure* like [Gini 
> coefficient|https://en.wikipedia.org/wiki/Gini_coefficient], 
> [entropy|https://en.wikipedia.org/wiki/Entropy_(information_theory)] or [mean 
> squared error|https://en.wikipedia.org/wiki/Mean_squared_error]. To calculate 
> the best split we need to build a _function_ that describes dependency 
> between split point (independent variable) and impurity measure (dependent 
> variable) and then find a minimum of this _function_.
> In case of a distributed system, when a data is partitioned by row, we can 
> calculate such _function_ on every node, compress it somehow, and then pass 
> it to the master node. On the master node we need to summarize _functions_ 
> received from all nodes and then find a minimum of the result _function_. 
> It's the way decision tree algorithm is implemented in Apache Ignite ML 
> module.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8059) Integrate decision tree with partition based dataset

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427163#comment-16427163
 ] 

ASF GitHub Bot commented on IGNITE-8059:


GitHub user dmitrievanthony opened a pull request:

https://github.com/apache/ignite/pull/3760

IGNITE-8059 Integrate decision tree with partition based dataset



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-8059

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3760.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3760


commit 820694f2e8a47847e43167cbbc30fbc7d9b47c7b
Author: Anton Dmitriev 
Date:   2018-04-05T08:16:28Z

IGNITE-8059 Initial version of decision trees implemented on top of
partition based dataset.

commit 3860e5a171c8cae1f38c73fe30c8d3d2e2a46246
Author: Anton Dmitriev 
Date:   2018-04-05T10:07:55Z

IGNITE-8059 Add tests for impurity (decision trees).

commit b8dbb817d288f3f57ed6373e76dacab5c48a68c1
Author: Anton Dmitriev 
Date:   2018-04-05T12:57:25Z

IGNITE-8059 Add tests for decision trees regression and classification
trainers, add special decision tree partition data class.

commit c3b54ed2663c8a8ab2697146f0437d9f104f25cf
Author: Anton Dmitriev 
Date:   2018-04-05T13:19:45Z

IGNITE-8059 Add step function compressor (initial version).

commit 08798caa274a60d065a7b716253b8c6fa54ee5b1
Author: Anton Dmitriev 
Date:   2018-04-05T14:04:13Z

IGNITE-8059 Add MNIST test for decision tree algorithm.

commit 01e9b1c6a7ed74ae5c71ecedc13aee7df341980a
Author: Anton Dmitriev 
Date:   2018-04-05T16:02:03Z

IGNITE-8059 Add MNIST tests and remove UI part of decision tree.




> Integrate decision tree with partition based dataset
> 
>
> Key: IGNITE-8059
> URL: https://issues.apache.org/jira/browse/IGNITE-8059
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Reporter: Anton Dmitriev
>Assignee: Anton Dmitriev
>Priority: Major
> Fix For: 2.5
>
>
> A partition based dataset (new underlying infrastructure component) was added 
> as part of IGNITE-7437 and now we need to adopt decision tree algorithm to 
> work on top of this infrastructure. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)