[GitHub] spark pull request: [SPARK-4001][MLlib] adding parallel FP-Growth ...

zhangyouhua2014 Sun, 25 Jan 2015 23:31:51 -0800

Github user zhangyouhua2014 commented on the pull request:

    https://github.com/apache/spark/pull/2847#issuecomment-71422810
  
    @mengxr 
    1 I mean I use step 1(that Equivalent to create FPTree and condition FPTree 
) we have reduce data size and create condition FPTreeï¼only include 
frequently item not transition dataï¼, when using condition FPTree mining 
frequently item setï¼it is have a small candidate set.
    2 I have test it and compared mahout pfpï¼it is a good performance that 
about 10 time.
    3 afer use groupByKey,ming frequently item set in each node that include 
Specified keyï¼so it is not network communication overhead.
    4 is there have aggregateByKey operator in new spark version?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4001][MLlib] adding parallel FP-Growth ...

Reply via email to