GitHub user hhbyyh opened a pull request:
https://github.com/apache/spark/pull/13656
[SPARK-15938]Adding "support" property to MLlib Association Rule
## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-15938
Support is an indication of how frequently the item-set appears in the
database. Besides confidence, "Support" is another critical property for
Association rule.
References:
https://en.wikipedia.org/wiki/Association_rule_learning
http://www.philippe-fournier-viger.com/spmf/index.php?link=documentation.php#allassociationrules
https://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf
Support can be either the count of appearances or the fraction within the
dataset. I choose to use the count as:
1. API compatibility: Currently both FPGrowthModel and Association Rule
does not have the information about size of the dataset. I'd try to avoid
breaking a list of public APIs.
2. This also refers to the API of SPMF.
http://www.philippe-fournier-viger.com/spmf/index.php?link=documentation.php#allassociationrules.
In the next steps, we could add constraint like minSupport as in other
libraries. FPGrowthModel should also contains the size of the dataset.
## How was this patch tested?
existing ut.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hhbyyh/spark supportAsso
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13656.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13656
----
commit 60efd0520a3af52995c2d6b1a2abaeebe658bb32
Author: Yuhao Yang <[email protected]>
Date: 2016-06-14T06:27:21Z
add support for association rule
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]