Github user zhangyouhua2014 commented on the pull request:

    https://github.com/apache/spark/pull/2847#issuecomment-71422810
  
    @mengxr 
    1 I mean I use step 1(that Equivalent to create FPTree and condition FPTree 
) we have reduce data size and create condition FPTree(only include 
frequently item not transition data), when using condition FPTree mining 
frequently item set,it is have a small candidate set.
    2 I have test it and compared mahout pfp,it is a good performance that 
about 10 time.
    3 afer use groupByKey,ming frequently item set in each node that include 
Specified key,so it is not network communication overhead.
    4 is there have aggregateByKey operator in new spark version?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to