Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/7258#issuecomment-120045016
@zhangjiajin Since you already collected the frequent items (length-1
patterns) to driver, you don't need to keep the RDD of length-1 patterns. When
generating the final patterns, recomputing the RDD is more expensive than
parallelizing the collected ones.
Another comment is to separate local computation from the distributed ones.
It makes the implementation easier to read. We can create a private object
called `LocalPrefixSpan`, with `run(sequences: Array[Array[Int]], minCount:
Int): Array[(Array[Int], Int)]`, then put all local methods under this object.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]