Github user zhangyouhua2014 commented on the pull request:
https://github.com/apache/spark/pull/2847#issuecomment-71163811
@mengxr . I am working with Jacky together to develop and test this
algorithm. I answered this questionï¼
We refer to the PFP paper, but reduces the process of building the tree,
omit this process it can use this time to do other things. Specific steps are
as follows:
1, the transaction database DB is distributed to more than one worker
nodes, after two scans transaction database, get conditional pattern sequences.
  1.1, the first scan DB, get a frequent itemsets L1. For example: (a,
6), (b, 5), (c, 3)
  1.2, according to 1.1) was L1 scanning DB again, to filter out
non-frequent item, get conditional pattern sequence conditionSEQ. For example:
(c, (a, b)), (b, (a)),
  After two scans DB get conditionSEQ, conditionSEQ DB is much smaller
than the amount of information.
2, reduce operations performed using groupByKey operator will conditionSEQ
on a machine of the same key into the presence of the same key conditionSEQ
worker set on each machine after the merger. The following is based
conditionSEQ to mining frequent item sets.
3, on each worker, using a priori principle of collective operations
conditionSEQ find frequent item sets.
4. Finally, the use of operators collect aggregate results.
 DB algorithm change will spread across multiple worker nodes only need
to scan twice to obtain the conditions set pattern sequence conditionSEQ small
amount of information in the collection; frequent item set mining is
onditionSEQ processed only once reduce, network interaction is small, so fast.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]