Github user zhangyouhua2014 commented on the pull request:
https://github.com/apache/spark/pull/2847#issuecomment-71422810
@mengxr
1 I mean I use step 1(that Equivalent to create FPTree and condition FPTree
) we have reduce data size and create condition FPTreeï¼only include
frequently item not transition dataï¼, when using condition FPTree mining
frequently item setï¼it is have a small candidate set.
2 I have test it and compared mahout pfpï¼it is a good performance that
about 10 time.
3 afer use groupByKey,ming frequently item set in each node that include
Specified keyï¼so it is not network communication overhead.
4 is there have aggregateByKey operator in new spark version?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]